Conversation
* ready for hPC * processed all segs * rerun organoid segs on HPC * fixed HPC script * fixed HPC script * update run list * update run list * update run list * update run list * update run list * segmentations re-completed * Update 2.segment_images/scripts/0.nuclei_segmentation.py Co-authored-by: Dave Bunten <ekgto445@gmail.com> * addressing comments --------- Co-authored-by: Dave Bunten <ekgto445@gmail.com>
* ready for hPC * processed all segs * rerun organoid segs on HPC * fixed HPC script * fixed HPC script * update run list * update run list * update run list * update run list * update run list * segmentations re-completed * Update 2.segment_images/scripts/0.nuclei_segmentation.py Co-authored-by: Dave Bunten <ekgto445@gmail.com> * addressing comments --------- Co-authored-by: Dave Bunten <ekgto445@gmail.com>
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
There was a problem hiding this comment.
Pull request overview
This PR implements organoid image segmentation and classification infrastructure, including:
- A morphology-dependent segmentation system with updated label handling
- Machine learning models (logistic regression and random forest) to classify organoid morphology
- A web-based annotation tool for labeling organoid images
- Feature extraction pipelines using SAM-Med3D and MorphEm models
- Scripts for preprocessing, training, prediction, and visualization
Changes:
- Added morphology-aware segmentation logic supporting labels: globular, cluster, small, dissociated, elongated, blank, and failed
- Created annotation tool with Flask-based UI for manual image labeling
- Implemented model training and prediction pipelines for organoid classification
- Added numerous utility scripts for preprocessing, feature extraction, and quality checking
Reviewed changes
Copilot reviewed 30 out of 46 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/segmentation_utils/src/cell_segmentation.py |
Updated segmentation logic to handle new organoid labels (contains critical bug) |
src/file_utils/src/arg_parsing_utils.py |
Added output_features_subparent_name argument |
src/featurization_utils/src/sammed3d_featurizer.py |
Added logging suppression (stdout not restored) |
2.segment_images/scripts/*.py |
Added 8 new scripts for training, prediction, preprocessing, and checking |
2.segment_images/annotation_tool/* |
New web-based annotation tool with Flask |
3.cellprofiling/* |
Deleted README and shell script (moved/refactored) |
.gitignore, .pre-commit-config.yaml |
Updated configuration files |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import tqdm.notebook as tqdm | ||
| else: | ||
| import tqdm | ||
| image_base_dir |
There was a problem hiding this comment.
The variable image_base_dir is used as a statement on line 30, which has no effect. This appears to be leftover debugging code that should be removed.
| output_dict["label"].append(row["predicted_label"]) | ||
| output_dict["predicted_or_gt"].append("predicted") | ||
| else: | ||
| output_dict["label"].append(row["label_name"]) | ||
| output_dict["predicted_or_gt"].append("gt") |
There was a problem hiding this comment.
The logic for determining predicted vs ground truth labels is inverted. When predicted_label is NaN (missing), the code marks it as "predicted", but it should append the predicted label value. When predicted_label is not NaN, it appends label_name and marks it as "gt" (ground truth). This appears backwards - rows with actual predictions should be marked as "predicted", and rows with ground truth labels should be marked as "gt".
| output_dict["label"].append(row["predicted_label"]) | |
| output_dict["predicted_or_gt"].append("predicted") | |
| else: | |
| output_dict["label"].append(row["label_name"]) | |
| output_dict["predicted_or_gt"].append("gt") | |
| # No prediction available: use ground-truth label | |
| output_dict["label"].append(row["label_name"]) | |
| output_dict["predicted_or_gt"].append("gt") | |
| else: | |
| # Prediction available: use predicted label | |
| output_dict["label"].append(row["predicted_label"]) | |
| output_dict["predicted_or_gt"].append("predicted") |
| # Suppress logging and stdout | ||
| import sys | ||
|
|
||
| logging.getLogger("transformers").setLevel(logging.ERROR) | ||
| logging.getLogger("torch").setLevel(logging.ERROR) | ||
| old_stdout = sys.stdout | ||
| sys.stdout = StringIO() |
There was a problem hiding this comment.
The stdout redirection to StringIO is never restored. After line 81, sys.stdout is redirected but there's no corresponding code to restore it to old_stdout. This will suppress all print statements for the rest of the program execution, which could hide important error messages or logs. Add sys.stdout = old_stdout after the model loading is complete.
wli51
left a comment
There was a problem hiding this comment.
Nice PR! Love the utility UI you've made. The only thing that stood out to me was the potential bug with blank morphology src/segmentation_utils/src/cell_segmentation.py. Otherwise LGTM.
This pull request introduces significant improvements to the image segmentation and annotation workflows, including the addition of a new lightweight image annotation tool, updates to environment and dependency management, and enhancements to segmentation scripts. The changes streamline the featurization process, improve testing coverage, and provide detailed documentation for new and existing features.
This PR runs segmentation and all things segmentation related including a segmentation tool for annotating "morphology" of organoids.
This is non-complete work and I need to merge this into the repo and then refactor this repo to account for software gardening changes.