Skip to content

Comments

Train organoid label model#129

Open
MikeLippincott wants to merge 35 commits intoWayScience:mainfrom
MikeLippincott:train_organoid_label_model
Open

Train organoid label model#129
MikeLippincott wants to merge 35 commits intoWayScience:mainfrom
MikeLippincott:train_organoid_label_model

Conversation

@MikeLippincott
Copy link
Member

@MikeLippincott MikeLippincott commented Feb 17, 2026

This pull request introduces significant improvements to the image segmentation and annotation workflows, including the addition of a new lightweight image annotation tool, updates to environment and dependency management, and enhancements to segmentation scripts. The changes streamline the featurization process, improve testing coverage, and provide detailed documentation for new and existing features.

This PR runs segmentation and all things segmentation related including a segmentation tool for annotating "morphology" of organoids.

This is non-complete work and I need to merge this into the repo and then refactor this repo to account for software gardening changes.

MikeLippincott and others added 30 commits January 13, 2026 14:07
* ready for hPC

* processed all segs

* rerun organoid segs on HPC

* fixed HPC script

* fixed HPC script

* update run list

* update run list

* update run list

* update run list

* update run list

* segmentations re-completed

* Update 2.segment_images/scripts/0.nuclei_segmentation.py

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* addressing comments

---------

Co-authored-by: Dave Bunten <ekgto445@gmail.com>
* ready for hPC

* processed all segs

* rerun organoid segs on HPC

* fixed HPC script

* fixed HPC script

* update run list

* update run list

* update run list

* update run list

* update run list

* segmentations re-completed

* Update 2.segment_images/scripts/0.nuclei_segmentation.py

Co-authored-by: Dave Bunten <ekgto445@gmail.com>

* addressing comments

---------

Co-authored-by: Dave Bunten <ekgto445@gmail.com>
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements organoid image segmentation and classification infrastructure, including:

  • A morphology-dependent segmentation system with updated label handling
  • Machine learning models (logistic regression and random forest) to classify organoid morphology
  • A web-based annotation tool for labeling organoid images
  • Feature extraction pipelines using SAM-Med3D and MorphEm models
  • Scripts for preprocessing, training, prediction, and visualization

Changes:

  • Added morphology-aware segmentation logic supporting labels: globular, cluster, small, dissociated, elongated, blank, and failed
  • Created annotation tool with Flask-based UI for manual image labeling
  • Implemented model training and prediction pipelines for organoid classification
  • Added numerous utility scripts for preprocessing, feature extraction, and quality checking

Reviewed changes

Copilot reviewed 30 out of 46 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/segmentation_utils/src/cell_segmentation.py Updated segmentation logic to handle new organoid labels (contains critical bug)
src/file_utils/src/arg_parsing_utils.py Added output_features_subparent_name argument
src/featurization_utils/src/sammed3d_featurizer.py Added logging suppression (stdout not restored)
2.segment_images/scripts/*.py Added 8 new scripts for training, prediction, preprocessing, and checking
2.segment_images/annotation_tool/* New web-based annotation tool with Flask
3.cellprofiling/* Deleted README and shell script (moved/refactored)
.gitignore, .pre-commit-config.yaml Updated configuration files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

import tqdm.notebook as tqdm
else:
import tqdm
image_base_dir
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable image_base_dir is used as a statement on line 30, which has no effect. This appears to be leftover debugging code that should be removed.

Copilot uses AI. Check for mistakes.
Comment on lines +155 to +159
output_dict["label"].append(row["predicted_label"])
output_dict["predicted_or_gt"].append("predicted")
else:
output_dict["label"].append(row["label_name"])
output_dict["predicted_or_gt"].append("gt")
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for determining predicted vs ground truth labels is inverted. When predicted_label is NaN (missing), the code marks it as "predicted", but it should append the predicted label value. When predicted_label is not NaN, it appends label_name and marks it as "gt" (ground truth). This appears backwards - rows with actual predictions should be marked as "predicted", and rows with ground truth labels should be marked as "gt".

Suggested change
output_dict["label"].append(row["predicted_label"])
output_dict["predicted_or_gt"].append("predicted")
else:
output_dict["label"].append(row["label_name"])
output_dict["predicted_or_gt"].append("gt")
# No prediction available: use ground-truth label
output_dict["label"].append(row["label_name"])
output_dict["predicted_or_gt"].append("gt")
else:
# Prediction available: use predicted label
output_dict["label"].append(row["predicted_label"])
output_dict["predicted_or_gt"].append("predicted")

Copilot uses AI. Check for mistakes.
Comment on lines +75 to +81
# Suppress logging and stdout
import sys

logging.getLogger("transformers").setLevel(logging.ERROR)
logging.getLogger("torch").setLevel(logging.ERROR)
old_stdout = sys.stdout
sys.stdout = StringIO()
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stdout redirection to StringIO is never restored. After line 81, sys.stdout is redirected but there's no corresponding code to restore it to old_stdout. This will suppress all print statements for the rest of the program execution, which could hide important error messages or logs. Add sys.stdout = old_stdout after the model loading is complete.

Copilot uses AI. Check for mistakes.
@MikeLippincott MikeLippincott requested a review from wli51 February 19, 2026 21:21
Copy link
Collaborator

@wli51 wli51 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR! Love the utility UI you've made. The only thing that stood out to me was the potential bug with blank morphology src/segmentation_utils/src/cell_segmentation.py. Otherwise LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants