Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR clarifies that OGU coverage values can be provided as either fractions or percentages, and makes corresponding updates to ensure consistency across documentation, code, and error messages. The key change is removing default values for min_coverage parameters to force users to explicitly specify thresholds that match their coverage value type (e.g., 0.01 for fractions vs 1 for percentages). The PR also adds explicit float casting for coverage values to handle cases where coverage data is read in as strings.
Key Changes
- Removed
DEFAULT_MIN_PERCENT_COVERAGEconstant and all default values formin_coverageparameters across public API functions - Added
OGU_AGNOSTIC_COVERAGE_KEYconstant for internal use in long-format dataframes while maintainingOGU_PERCENT_COVERAGE_KEYfor user-facing input - Updated all documentation and error messages to refer to generic "coverage" instead of "% coverage" or "percent coverage" where appropriate
- Added explicit float casting in
_generate_ogu_coverages_per_sample_dfto handle string input
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| pysyndna/src/calc_cell_counts.py | Main changes including removal of DEFAULT_MIN_PERCENT_COVERAGE constant, addition of OGU_AGNOSTIC_COVERAGE_KEY for internal processing, parameter renaming from ogu_percent_coverage_df to ogu_coverage_df throughout, comprehensive documentation updates to clarify fraction vs percentage usage, removal of min_coverage defaults from all public functions, and addition of float casting for coverage columns |
| pysyndna/tests/test_calc_cell_counts.py | Test data structure updates to use OGU_AGNOSTIC_COVERAGE_KEY in expected output dictionaries, error message updates to remove "%" references, and addition of explicit string casting test for coverages_df to verify the new float casting functionality |
| pysyndna/src/fit_syndna_models.py | Minor comment improvement to be more accurate about checking multiple input dataframes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Remove descriptions throughout of coverage as necessarily being a percentage, updated comments to indicate it can be either a percentage OR a fraction and that the user must verify that the min_coverage value is of the same type (percentage or fraction) as the coverage values. To make type mismatches less likely, remove the default value for the min_coverage parameter throughout so users must specify each time (hopefully in a way that matches the coverages they specify). Add explicit cast of coverage values to floating point in case coverage df is read in as strings.