Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize the process #8

Open
labra opened this issue Jul 14, 2021 · 1 comment
Open

Parallelize the process #8

labra opened this issue Jul 14, 2021 · 1 comment

Comments

@labra
Copy link
Member

labra commented Jul 14, 2021

At this moment the dump processor seems to be sequential and uses only one of the 60 cores in our server.

This can be seen with htop

It would be nice to do some research about how we can parallelize the dump processing code. Some questions that I would like to answer:

  • Is it possible to parallelize the processing of gzip files? At this moment, the processor extends EntityTimerProcessor. Can that processor work in parallel? Could we use another approach for gzipped files?
  • Ideally, I would like to use cats.effect.IO or fs2, but I am not sure if this has been tried before or even if it is possible...
@labra
Copy link
Member Author

labra commented Nov 9, 2022

Another option discussed during the biohackathon would be to split the source dump in the same number of parts as CPUs, run the dockers in parallel in the different CPUs, and join the results. @nilshoffmann

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant