Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Lumos v0.0.3 - Changelog
Table of contents
Enhancement to existing QC pipeline
Performance
1. Modified pipeline
What
The resizing operation of the images in the pipeline was moved to be performed before all other computations on the images.
Why
This resizing was previously performed after all other operations. But with a resizing factor of usually
0.1
this was a big opportunity for performance and storage efficiency improvements.As all other computations are element-wise, moving this resizing operation should have no impact on the output, and was an obvious improvement to carry.
This would both accelerate the computation and reduce the temporary storage space taken by the program.
How
The resizing operation was moved right after the images are being loaded inside the program, instead of after all other computations were done (such as the conversion of the images from 16bit to 8bit).
Compare the changes in the code for more details.
Side effects
There should be no significant side effects on the final render of the images. All other operations performed on the image are element-wise, so resizing it before or after them should not have any impact on the output.
2. Parallelism
What
Lumos can now utilize multiple CPU cores to perform the computation of each of a plate's channels in parallel.
Why
One of the main requested features was performance improvements. To this extent, parallelism seemed like a somewhat simple solution to this, as Python scripts only utilize one CPU core by default.
How
The
multiprocessing
standard library was used in order to implement this feature. Threads are spawned from the paralleled functions using thePool
object.To ensure concurrency, the only paralleled function is currently
render_single_plate_plateview
(its parallel form beingrender_single_plate_plateview_parallelism
). It spawns a new process/thread for each of the channels of the current plate to be rendered. Therender_single_run_plateview
function relies on either of the form of this function, according to if the number of cores requested by the user for the execution of the program is more than one.Compare the changes in the code for more details.
Side effects
The KeyboardInterrupts (Ctrl C) do not work when parallelism is enabled. The only way to halt the execution of the program prematurely is to close the terminal.
In the Windows Terminal (this did not occur in the Ubuntu terminal during testing), the TQDM progress bars to track the evolution of the program sometimes get printed on a new line, and this makes the console illegible. To counteract this issue, during a paralleled execution of the program, prints are limited to a single progress bar tracking the plates of a run being processed.
Logging also breaks when running several processes in parallel. This is because they are all trying to write to the same file at the same time. To prevent errors from happening in the middle of the program's execution, logs are disabled when parallelism is enabled.
Functionality
1. Improved Command-Line Interface (CLI)
What
a. Some arguments have been given alias/shortcuts to make the Lumos commands clearer and more succinct.
b. New arguments have been introduced:
--brightfield/-b
--parallelism/-p
c. A debug argument,
--keep-temp-files
has been added but should not be used by end-users.Why
a. Because it makes the usage of the CLI cleaner, and was very easy to do.
b.
--brightfield
is to control which Brightfield channels get rendered. This feature was requested as Brightfield channels are not often used in QC, and omitting them from the rendering process would speed up the total execution time of the program), and *--parallelism
is to control how many CPU cores get used for parallelism.c. This argument has been added to speed up testing on-the-fly during development.
How
All those modifications were implemented by changing the
click
interface of the program.Refer to the
readme.md
documentation for more details on the refactored and newly introduced arguments.Side effects
Those modifications have no side effects.
However, changes made to CLI related to the new Cell Painting mode of Lumos are breaking to the previous Command-line Interface of Lumos. These changes and their effects will be detailed in the relevant section of this current document.
2. Missing images markers
What
This functionality reveals more strikingly than before missing images from the database, or images that may have been corrupted during the copying process.
It adds custom markers to each of those missing images, on top of a solid background, and the result serves as a placeholder for those missing images through the rest of the QC pipeline.
Why
This improved feature was requested as previously a simple solid grey-colored placeholder image was used to replace missing files. This was not very visible and a more visually distinctive placeholder was needed.
How
A new function
draw_markers
has been added to thelumos/toolbox.py
module. This function first computes the properties of the figures to draw on the input image and then draws then is the specified input color.This function is used after a dummy placeholder image is created when the loading of a site image fails.
Compare the changes in the code for more details.
Side effects
There are no side effects to this. This is purely cosmetic.
This can however not be implemented for the new Cell-Painting operation mode as this would go through the RGB blending pipeline and give distracting images.
3. Upgraded logger
What
a. The logger now writes to a new file when the current one reaches 2MB. (Note: A better functionality of the logger, however, would be to only keep the last 2MB of logs in a single file. This does not appear to be easily implemented with the existing
logging
standard Python package).b. Also, the logger instance now changes behavior based on if the program is running using parallelism.
Why
a. This allows the user to freely delete older logs if wanted, while keeping the most recent ones.
b. As detailed in this previous section on parallelism, the logging functionality breaks when using parallelism, so logs are disabled in that case.
How
a. A new
RotatingFileHandler
from thelogging
library is used to define the behavior of the logger.b. A new
lumos/logger.py
module has been created to allow the logger to have an internal state variable. This state variable is a boolean that indicates if the Lumos program currently uses parallelism. If it is the case, no logs are stored, and selected prints are sent to the console.Compare the changes in the code for more details.
Side effects
There are no side effects "per say" to this implementation as everything works as intended: in some cases logs wont get written to the log file, or prints won't get printed to the console, but this is the desired behavior.
New implementation of Cell Painting
What
A Cell Painting mode is now included in Lumos. It allows the user to render an RGB image of a plate with all its channels color-coded and blended together.
The style and algorithm used for the blending can be chosen by the user, and an "accurate" style is provided to try to match the emission wavelength of each channel when colorizing them.
Why
A Cell Painting mode was requested for 2 reasons:
How
The basic pipeline for Cell Painting was taken from the previous explorative work carried by Nicolas Boisseau (@nicolasboisseau) and was built upon.
Most notably, an algorithm was designed in order blend the different channels of each plate together in a way that was as close as possible to the actual emission wavelength of each channels.
Side effects
The command-line
$ lumos
command has changed syntax. This is a breaking change.To distinguish which operation mode to use, its identifier needs to be indicated after the
lumos
keyword before typing any arguments.The two operation modes' identifiers are:
qc
for the Quality Control (legacy) modecp
for the Cell Painting (new) modeE.g.
$ lumos qc --scope run --source-path ./source/run1 --output-path ./output/run1
For more information on both the new operating mode and its associated arguments, please refer to the
readme.md
documentationOther work and changes
HTML documentation
What
A more thorough XML documentation has been carried in the codebase, and an HTML documentation has been generated in the
/docs
folder. The latter can be viewed in any web-browser by opening thedocs/index.html
file.Why
Simple and clean documentation of the codebase makes the onboarding of new developers wishing to improve or maintain the program easier.
How
The HTML documentation was generated from the new XML code documentation using the
pdoc3
package.Side effects
There are no side effects to this.
Testing
What
Two new tests were added to the
/tests
folder to check that both the Quality Control and Cell Painting pipelines still work as expected during development:test_2_qc_pipeline.py
test_3_cp_pipeline.py
Why
These tests can be used during development so that modifications made to the codebase are non-breaking to the basic functionality of the program.
How
These tests are written and intended to be used with the
Pytest
package.For exact implementation of these tests, please refer to their respective script files.
Side effects
There are no side effects to this.