Refine quarto by pdimens · Pull Request #204 · pdimens/harpy

pdimens · 2025-02-13T17:36:21Z

This PR:

uses a new model to pull the quarto configs from github, ensuring reports use the latest design regardless of harpy version (after this release)
addressess base figure height for per-sample plots to give them moer vertical space
restores the nicer lumen theme that was used when reports were made with flexdashboard. It's not a 1:1, but it's close and the favicon is back.

Summary by CodeRabbit

New Features
- Introduced dynamic configuration file retrieval with online sourcing and local fallback, providing clearer error messages.
- Integrated additional styling resources into reports for flexible theming.
- Added error handling for input data files to prevent processing with empty datasets.
- Enhanced reporting capabilities by incorporating new input files and modifying output structures for better management.
- Improved user interaction with tooltips in data tables for enhanced clarity.
- Added new metrics and visualizations in reports, enhancing data presentation.
Refactor
- Streamlined report generation workflows with unified inputs and outputs across pipelines.
- Removed outdated functionality to simplify processing.
Style
- Adjusted figure height calculations across reports for enhanced visual layout and improved readability.
- Updated datatable options for better layout management.
- Added fillContainer option to datatables for improved layout.

coderabbitai · 2025-02-13T17:36:29Z

Warning

Rate limit exceeded

@pdimens has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 4 minutes and 13 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between fd811a6 and 285b35d.

📒 Files selected for processing (6)

.github/workflows/tests.yml (35 hunks)
harpy/_cli_types_params.py (22 hunks)
harpy/qc.py (1 hunks)
harpy/reports/bcftools_stats.qmd (4 hunks)
harpy/reports/hapcut.qmd (5 hunks)
harpy/snakefiles/qc.smk (2 hunks)

📝 Walkthrough

Walkthrough

The changes implement a new configuration file retrieval mechanism in the Harpy application. The logic now attempts to download configuration files (both YAML and SCSS) from a specified GitHub URL and, if that fails, falls back to local file retrieval with improved error messaging. In parallel, several report files have been updated to adjust figure dimensions, remove redundant default handling, and parameterize file inputs. Additionally, numerous Snakemake workflows have been refactored to accept both YAML and SCSS inputs—with shell commands replaced by Python run blocks using shutil.copy—and deprecated indexing functionalities have been removed.

Changes

File(s)	Change Summary
`harpy/_misc.py` `harpy/reports/_quarto.yml`	Updated configuration fetching: Added a try-except mechanism to first download `_quarto.yml` and `_harpy.scss` from GitHub with a fallback to local retrieval; updated theme stylesheet reference from `harpy.scss` to `_harpy.scss`.
`harpy/reports/align_bxstats.qmd` `harpy/reports/align_stats.qmd` `harpy/reports/bcftools_stats.qmd` `harpy/reports/bx_count.qmd` `harpy/reports/impute.qmd` `harpy/reports/leviathan_pop.qmd` `harpy/reports/preflight_fastq.qmd`	Modified report documents: Adjusted figure height calculations (increasing the base height value in multiple files), removed now-unwanted NA/NaN default handling, parameterized file inputs, and introduced error handling for empty or missing data files.
All `harpy/snakefiles/.smk` (e.g., `align_bwa.smk`, `align_ema.smk`, `align_strobealign.smk`, `impute.smk`, `phase.smk`, `preflight_bam.smk`, `preflight_fastq.smk`, `qc.smk`, `snp_freebayes.smk`, `snp_mpileup.smk`, `sv_leviathan.smk`, `sv_leviathan_pop.smk`, `sv_naibr.smk`)	Updated Snakemake workflows: Modified report-related rules to accept two inputs (YAML and SCSS), updated output declarations to use temporary files, replaced shell commands with Python run blocks utilizing `shutil.copy`, removed deprecated BAM indexing functions and rules, and streamlined input parameter handling across multiple rules.

Sequence Diagram(s)

sequenceDiagram
    participant F as Fetch_Report Function
    participant U as GitHub URL
    participant L as Local File System
    participant E as Error Handler

    F->>U: Attempt to download _quarto.yml / _harpy.scss
    alt Download Successful
        U-->>F: Return configuration file
    else Download Fails
        F->>L: Retrieve file from local installation
        L-->>F: Return local configuration file
        F->>E: Log detailed error message (if needed)
    end
    F->>F: Proceed with report generation using retrieved files

sequenceDiagram
    participant R as Report_Config Rule
    participant I as Input Files (YAML & SCSS)
    participant S as Python Run Block (shutil.copy)
    participant O as Temporary Output Files

    R->>I: Receive YAML & SCSS inputs
    R->>S: Invoke run block to copy files
    S->>O: Generate temporary YAML and SCSS files for workflow
    O-->>R: Output available for downstream reporting rules

Possibly related PRs

tweaks to the alignstats reports and lumen theme #203: Modifies the _quarto.yml configuration retrieval and theme settings, aligning closely with the new file fetching logic and stylesheet updates.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🔭 Outside diff range comments (2)

harpy/snakefiles/preflight_bam.smk (1)

12-24: 🛠️ Refactor suggestion

Consider revising the discount and fee structure.

The current implementation has a potential business logic issue: adding a flat $20 fee to discounted bills can negate the discount benefits, especially for smaller purchases. For example:

$100 purchase with 10% discount = $90

After $20 fee = $110 (more expensive than without discount)

This could lead to customer dissatisfaction and defeat the purpose of the loyalty program.

Consider one of these alternatives:

Apply fee before discount

Scale fee based on purchase amount

Remove fee for loyal customers

Would you like me to propose a revised implementation that better aligns with customer loyalty incentives?

harpy/snakefiles/align_bwa.smk (1)

12-24: 🛠️ Refactor suggestion

Revise billing logic to prevent discount negation.

The current implementation adds a flat $20 fee after applying the discount, which can result in a higher final price than the original amount for small purchases. For example:

$100 purchase with 10% discount = $90 + $20 fee = $110 (more than original price)

Consider these alternatives:

Apply fee before discount

Scale fee based on purchase amount

Set minimum purchase amount for fee application

Would you like me to provide an implementation for any of these approaches?

🧹 Nitpick comments (1)

harpy/snakefiles/align_strobealign.smk (1)
201-208: Consider using absolute paths for QMD files.

While the changes correctly integrate the new configuration system, using relative paths for QMD files could cause issues if the workflow is called from a different directory.
-        qmd = f"{outdir}/workflow/report/align_stats.qmd"
+        qmd = os.path.join(os.getcwd(), outdir, "workflow", "report", "align_stats.qmd")

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e65aa1 and 31b331c.

📒 Files selected for processing (25)

harpy/_misc.py (2 hunks)
harpy/reports/_quarto.yml (1 hunks)
harpy/reports/align_bxstats.qmd (1 hunks)
harpy/reports/align_stats.qmd (1 hunks)
harpy/reports/bcftools_stats.qmd (1 hunks)
harpy/reports/bx_count.qmd (1 hunks)
harpy/reports/impute.qmd (1 hunks)
harpy/reports/leviathan_pop.qmd (0 hunks)
harpy/reports/preflight_fastq.qmd (0 hunks)
harpy/snakefiles/align_bwa.smk (2 hunks)
harpy/snakefiles/align_ema.smk (2 hunks)
harpy/snakefiles/align_strobealign.smk (2 hunks)
harpy/snakefiles/impute.smk (2 hunks)
harpy/snakefiles/phase.smk (1 hunks)
harpy/snakefiles/preflight_bam.smk (1 hunks)
harpy/snakefiles/preflight_fastq.smk (1 hunks)
harpy/snakefiles/qc.smk (1 hunks)
harpy/snakefiles/snp_freebayes.smk (1 hunks)
harpy/snakefiles/snp_mpileup.smk (1 hunks)
harpy/snakefiles/sv_leviathan.smk (1 hunks)
harpy/snakefiles/sv_leviathan_pop.smk (2 hunks)
harpy/snakefiles/sv_naibr.smk (1 hunks)
harpy/snakefiles/sv_naibr_phase.smk (1 hunks)
harpy/snakefiles/sv_naibr_pop.smk (2 hunks)
harpy/snakefiles/sv_naibr_pop_phase.smk (2 hunks)

💤 Files with no reviewable changes (2)

harpy/reports/leviathan_pop.qmd
harpy/reports/preflight_fastq.qmd

✅ Files skipped from review due to trivial changes (2)

harpy/reports/_quarto.yml
harpy/reports/bx_count.qmd

🧰 Additional context used

🪛 Ruff (0.8.2)

harpy/_misc.py

94-94: Do not use bare except

(E722)

99-99: f-string without any placeholders

Remove extraneous f prefix

(F541)

108-108: Do not use bare except

(E722)

113-113: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (38)

harpy/snakefiles/preflight_bam.smk (1)

52-62: LGTM! Clean implementation of the report configuration rule.

The changes properly handle both YAML and SCSS files, and correctly mark outputs as temporary.

harpy/snakefiles/snp_freebayes.smk (2)

195-206: LGTM! Clean implementation of SCSS file handling.

The changes to report_config rule properly handle both YAML and SCSS files, making them temporary outputs and using Python's shutil for efficient file operations.

207-262: LGTM! Report generation properly integrates SCSS styling.

The variant_report rule correctly incorporates the SCSS file while maintaining existing functionality.

harpy/snakefiles/sv_leviathan.smk (2)

186-209: LGTM! Consistent implementation of report configuration.

The changes to report_config rule maintain consistency with other workflow files.

198-233: LGTM! Sample reports properly integrate styling.

The sample_reports rule correctly incorporates the SCSS file while maintaining existing functionality.

harpy/snakefiles/sv_naibr.smk (2)

198-209: LGTM! Consistent implementation of report configuration.

The changes to report_config rule maintain consistency with other workflow files.

210-233: LGTM! Sample reports properly integrate styling.

The sample_reports rule correctly incorporates the SCSS file while maintaining existing functionality.

harpy/snakefiles/snp_mpileup.smk (2)

230-241: LGTM! Consistent implementation of report configuration.

The changes to report_config rule maintain consistency with other workflow files.

242-262: LGTM! Report generation properly integrates styling.

The variant_report rule correctly incorporates the SCSS file while maintaining existing functionality.

harpy/reports/impute.qmd (1)

336-336: LGTM! Improved figure height calculation.

The increased base height from 1 to 3 provides better vertical space for per-sample plots, enhancing visibility and readability.

harpy/snakefiles/phase.smk (1)

258-266: LGTM! Improved configuration file handling.

The changes enhance the report configuration by:

Explicitly handling both YAML and SCSS files

Using Python's shutil.copy for more reliable file operations

Efficiently handling multiple files with zip iteration

harpy/snakefiles/sv_naibr_phase.smk (1)

258-266: LGTM! Consistent configuration file handling.

The changes maintain consistency with other workflow files by using the same robust approach for handling YAML and SCSS configurations.

harpy/snakefiles/sv_naibr_pop.smk (1)

231-239: LGTM! Consistent configuration handling across codebase.

The changes maintain the standardized approach for handling YAML and SCSS configurations, ensuring consistency throughout the entire codebase.

harpy/reports/bcftools_stats.qmd (2)

23-23: LGTM! Improved flexibility with parameterized input.

The change from hardcoded path to using params$infile makes the code more configurable and reusable.

340-340: LGTM! Enhanced plot visibility with increased base height.

The change from 1 + (0.6 * nrow(psc)) to 3 + (0.6 * nrow(psc)) provides more vertical space for per-sample plots while maintaining proportional scaling with the number of samples.

harpy/reports/align_bxstats.qmd (1)

131-131: LGTM! Consistent enhancement of plot visibility.

The change from 1 + (0.4 * n_samples) to 3 + (0.4 * n_samples) aligns with similar changes in other files, providing better visualization while maintaining proportional scaling.

harpy/snakefiles/sv_naibr_pop_phase.smk (3)

316-324: LGTM! Enhanced report configuration with improved file handling.

The changes improve the report configuration by:

Adding SCSS styling support

Using Python's shutil.copy for robust file handling

Properly marking outputs as temporary for cleanup

328-330: LGTM! Consistent styling support for sample reports.

Added SCSS file to inputs, maintaining consistency with the report configuration changes.

353-354: LGTM! Consistent styling support for aggregate reports.

Added SCSS file to inputs, maintaining consistency with other report-related changes.

harpy/snakefiles/align_ema.smk (3)

291-299: LGTM! Consistent report configuration implementation.

The changes mirror those in other Snakemake files, maintaining consistency in:

SCSS styling support

Python-based file handling

Temporary output management

303-304: LGTM! Consistent styling support in sample reports.

Added SCSS file to inputs, maintaining consistency across the codebase.

362-363: LGTM! Consistent styling support in barcode reports.

Added SCSS file to inputs, maintaining consistency with other report-related changes.

harpy/reports/align_stats.qmd (4)

34-37: LGTM! Robust error handling for empty input files.

The added error handling prevents report generation when input files are empty, with clear error messages. This improves the robustness of the reporting process.

Also applies to: 379-382, 384-387

1-700: LGTM! Well-structured and organized code.

The code is logically organized into clear sections with descriptive headers and comments, making it maintainable and easy to follow.

203-259: LGTM! Well-implemented data processing and visualization.

The data processing and visualization code is well-implemented with proper error handling, clear labels, and interactive features. The highcharter configurations are comprehensive and user-friendly.

Also applies to: 272-295, 307-329, 337-359, 478-498

666-697: LGTM! Comprehensive and clear documentation.

The documentation thoroughly explains technical terms, file formats, and data interpretations, making the report accessible to users.

harpy/snakefiles/preflight_fastq.smk (2)

67-77: LGTM! Enhanced report configuration handling.

The changes improve the rule by adding SCSS file support and using Python's shutil.copy for more reliable file operations. Making outputs temporary helps with cleanup.

79-97: LGTM! Added SCSS file support.

The rule now correctly includes the SCSS file in its inputs while maintaining existing functionality.

harpy/snakefiles/qc.smk (2)

150-161: LGTM! Enhanced report configuration handling.

The changes improve the rule by adding SCSS file support and using Python's shutil.copy for more reliable file operations. Making outputs temporary helps with cleanup.

162-182: LGTM! Added SCSS file support.

The rule now correctly includes the SCSS file in its inputs while maintaining existing functionality.

harpy/snakefiles/impute.smk (3)

123-134: LGTM! Enhanced report configuration handling.

The changes improve the rule by adding SCSS file support and using Python's shutil.copy for more reliable file operations. Making outputs temporary helps with cleanup.

135-164: LGTM! Added SCSS file support.

The rule now correctly includes the SCSS file in its inputs while maintaining existing functionality.

227-256: LGTM! Added SCSS file support.

The rule now correctly includes the SCSS file in its inputs while maintaining existing functionality.

harpy/snakefiles/sv_leviathan_pop.smk (3)

241-252: LGTM! Enhanced report configuration handling.

The changes improve the rule by adding SCSS file support and using Python's shutil.copy for more reliable file operations. Making outputs temporary helps with cleanup.

253-276: LGTM! Added SCSS file support.

The rule now correctly includes the SCSS file in its inputs while maintaining existing functionality.

278-301: LGTM! Added SCSS file support.

The rule now correctly includes the SCSS file in its inputs while maintaining existing functionality.
harpy/snakefiles/align_strobealign.smk (2)

188-199: LGTM! Clean refactor of report configuration handling.

The changes improve modularity by separating YAML and SCSS configurations, and the use of Python's shutil.copy is more robust than shell commands.

274-280: Use absolute paths for QMD files here as well.

Similar to the sample_reports rule, consider using absolute paths for QMD files to ensure robustness.
-        qmd = f"{outdir}/workflow/report/align_bxstats.qmd"
+        qmd = os.path.join(os.getcwd(), outdir, "workflow", "report", "align_bxstats.qmd")

harpy/_misc.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

harpy/snakefiles/phase.smk (1)
268-288: LGTM! Consider using Python's shutil.copy for consistency.

The changes correctly integrate the SCSS styling and handle QMD file processing. For consistency with the report_config rule, consider using Python's shutil.copy instead of the shell cp command.
-        cp {input.qmd} {output.qmd}
+    run:
+        import shutil
+        shutil.copy(input.qmd, output.qmd)
         INFILE=$(realpath {input.data})
         quarto render {output.qmd} --log {log} --quiet -P blockfile:$INFILE

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 31b331c and 35437ea.

📒 Files selected for processing (2)

harpy/_misc.py (2 hunks)
harpy/snakefiles/phase.smk (1 hunks)

🧰 Additional context used

🪛 Ruff (0.8.2)

harpy/_misc.py

94-94: Do not use bare except

(E722)

99-99: f-string without any placeholders

Remove extraneous f prefix

(F541)

108-108: Do not use bare except

(E722)

113-113: f-string without any placeholders

Remove extraneous f prefix

(F541)

⏰ Context from checks skipped due to timeout of 90000ms (1)

GitHub Check: Build/Cache Container

🔇 Additional comments (3)

harpy/snakefiles/phase.smk (1)

256-267: LGTM! Clean implementation of the configuration file handling.

The changes effectively handle both YAML and SCSS files using a concise Python implementation. The use of temporary files and shutil.copy is appropriate for this task.
harpy/_misc.py (2)

8-8: LGTM!

The urllib.request import is correctly placed and necessary for the new GitHub file retrieval functionality.

89-115: Previous review comments are still applicable.

The suggested improvements from previous reviews regarding error handling, code duplication, and error message consistency are still valid.

Add file operation safeguards.

The current implementation could lead to race conditions or partial writes if there are network interruptions during file downloads. Consider using temporary files and atomic moves for safer file operations.
 def fetch_config_file(file_name, github_path, workdir):
     try:
         destination = f"{workdir}/report/{file_name}"
+        temp_destination = f"{destination}.tmp"
         with urllib.request.urlopen(github_path) as response, \
-             open(destination, 'w') as out_file:
+             open(temp_destination, 'w') as out_file:
             out_file.write(response.read().decode("utf-8"))
+        os.replace(temp_destination, destination)
     except Exception as e:
         if os.path.exists(temp_destination):
             os.unlink(temp_destination)
🧰 Tools

🪛 Ruff (0.8.2)

94-94: Do not use bare except

(E722)

99-99: f-string without any placeholders

Remove extraneous f prefix

(F541)

108-108: Do not use bare except

(E722)

113-113: f-string without any placeholders

Remove extraneous f prefix

(F541)

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (6)

harpy/reports/leviathan.qmd (2)
125-137: LGTM! The datatable configuration enhances report appearance.

The addition of fillContainer = T improves the table's responsiveness and visual presentation.

Consider using TRUE instead of T for better code readability and consistency with R best practices:
-  fillContainer = T,
+  fillContainer = TRUE,
148-160: LGTM! Consistent datatable configuration applied.

The same responsive layout enhancement is consistently applied to the variants per contig table.

Similarly, consider using TRUE instead of T:
-  fillContainer = T,
+  fillContainer = TRUE,
harpy/reports/align_bxstats.qmd (1)
92-93: Consider using tidyr::replace_na for better readability.

The current NA handling could be more idiomatic using tidyr's functions.
-aggregate_df[is.na(aggregate_df)] <- 0
+aggregate_df <- aggregate_df %>% replace_na(list(
+  totalreads = 0,
+  totaluniquemol = 0,
+  # ... other columns ...
+))
.github/workflows/tests.yml (1)
106-119: Consider cleaning up commented singularity artifact sections.

Multiple jobs contain identical commented-out sections for downloading singularity artifacts. If these sections are no longer needed, consider removing them to improve code maintainability. If they're planned for future use, consider adding a TODO comment explaining the plan.
-#      - name: Download Singularity Artifact
-#        uses: actions/download-artifact@v4
-#        with:
-#          name: deps-image
-#          path: .snakemake/singularity
Also applies to: 147-151, 185-189, 227-231, 267-271, 304-308, 342-346, 379-384, 418-422, 459-463, 500-504, 542-546, 589-594, 633-637, 684-689, 743-747, 783-787
harpy/reports/hapcut.qmd (2)
263-280: Reduce code duplication by extracting the headerCallback function.

The headerCallback function implementation is duplicated between per-contig and per-sample tables. Consider extracting it into a reusable function.

Create a helper function at the beginning of the file:
create_header_callback <- function(column_description) {
  c(
    "function(thead, data, start, end, display){",
    paste0(" var tooltips = ['", paste(column_description, collapse = "','"), "'];"),
    "  for(var i=0; i<tooltips.length; i++){",
    "    $('th:eq('+i+')',thead).attr('title', tooltips[i]);",
    "  }",
    "}"
  )
}
Then use it for both tables:
- headerCallback <- c(
-   "function(thead, data, start, end, display){",
-   paste0(" var tooltips = ['", paste(column_description, collapse = "','"), "'];"),
-   "  for(var i=0; i<5; i++){",
-   "    $('th:eq('+i+')',thead).attr('title', tooltips[i]);",
-   "  }",
-   "}"
- )
+ headerCallback <- create_header_callback(column_description)
Also applies to: 322-339

263-271: Standardize tooltip descriptions between tables.

The descriptions for similar columns in per-contig and per-sample tables have slight variations. Consider standardizing them for consistency.

For example:

Per-contig: "total haplotypes on the contig"

Per-sample: "number of haplotypes"

Consider using consistent phrasing across both tables.

Also applies to: 322-330

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19ea4c4 and 2055ece.

📒 Files selected for processing (11)

.github/workflows/tests.yml (18 hunks)
harpy/reports/align_bxstats.qmd (4 hunks)
harpy/reports/align_stats.qmd (5 hunks)
harpy/reports/bcftools_stats.qmd (2 hunks)
harpy/reports/bx_count.qmd (4 hunks)
harpy/reports/hapcut.qmd (4 hunks)
harpy/reports/impute.qmd (3 hunks)
harpy/reports/leviathan.qmd (2 hunks)
harpy/reports/leviathan_pop.qmd (3 hunks)
harpy/reports/preflight_bam.qmd (1 hunks)
harpy/reports/preflight_fastq.qmd (1 hunks)

✅ Files skipped from review due to trivial changes (1)

harpy/reports/preflight_bam.qmd

🚧 Files skipped from review as they are similar to previous changes (6)

harpy/reports/impute.qmd
harpy/reports/leviathan_pop.qmd
harpy/reports/bx_count.qmd
harpy/reports/preflight_fastq.qmd
harpy/reports/align_stats.qmd
harpy/reports/bcftools_stats.qmd

⏰ Context from checks skipped due to timeout of 90000ms (18)

GitHub Check: align strobe
GitHub Check: simulate variants
GitHub Check: simulate linkedreads
GitHub Check: qc
GitHub Check: preflight
GitHub Check: phase
GitHub Check: sv naibr
GitHub Check: mpileup
GitHub Check: sv leviathan
GitHub Check: impute
GitHub Check: freebayes
GitHub Check: harpy extras
GitHub Check: align EMA
GitHub Check: demux gen1
GitHub Check: metassembly
GitHub Check: deconvolve
GitHub Check: Rebuild Container
GitHub Check: align BWA

🔇 Additional comments (8)

harpy/reports/align_bxstats.qmd (5)

28-35: LGTM! Well-implemented utility function.

The avg_percent_of function is a clean implementation that correctly handles zero values and provides precise percentage calculations.

110-171: LGTM! Value boxes enhance report readability.

The addition of value boxes provides a clear, at-a-glance summary of key metrics. The implementation correctly handles edge cases by filtering out zero values when calculating averages.

176-205: LGTM! Excellent documentation with tooltips.

The implementation of column descriptions and tooltips significantly improves the user experience by providing clear explanations for each metric.

224-226: LGTM! Figure height adjustment aligns with PR objectives.

The change from 1 + (0.4 * n_samples) to 3 + (0.4 * n_samples) provides more vertical space for per-sample plots, improving visibility as intended in the PR objectives.

271-336: LGTM! Informative new visualizations.

The addition of "Reads per molecule" distribution and "Linked-read library performance" scatter plot provides valuable insights into library quality and sequencing performance. The implementation includes proper error handling for cases with insufficient unique values.
.github/workflows/tests.yml (2)

58-60: LGTM! Job name and description changes are clear and descriptive.

The renaming from pkgbuild to containerize better reflects the job's purpose, and the description update from "Build/Cache Container" to "Rebuild Container" is more precise.

121-800:
Details

✅ Verification successful

Verify the impact of removing container build dependencies.

The removal of the pkgbuild (now containerize) dependency from multiple jobs could improve workflow execution time through increased parallelization. However, please verify that these jobs can safely run without waiting for the container to be built.

Run this script to check if any of these jobs use the container:

🏁 Script executed:
#!/bin/bash
# Description: Check for container usage in job definitions

# Search for container references in job steps
rg -A 5 'container:|image:|uses: docker' 

# Search for singularity/apptainer references
rg -A 5 'singularity|apptainer'
Length of output: 60293
Container Dependency Verification Confirmed – Dependency Removal is Safe

Our analysis indicates that none of the jobs in the tests workflow require an actively built container. In the repository’s Snakefiles and harpy modules, container usage is consistently set to “None” (or conditionally bypassed via “conda” vs. “conda apptainer” logic), confirming that the former dependency on container building (previously via pkgbuild) is not essential for these jobs. This change should improve parallelization without impacting functionality.

• Verify that no production workflow unexpectedly depends on container builds.
• Confirm that tests executing harpy commands continue to run as expected in your CI environment.

🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 312-312: trailing spaces

(trailing-spaces)

[error] 429-429: trailing spaces

(trailing-spaces)
harpy/reports/hapcut.qmd (1)

1-386: LGTM! The implementation aligns well with the PR objectives.

The changes successfully enhance the report's functionality by adding informative tooltips to the data tables, improving user experience and data interpretation. The code is well-documented and organized.

harpy/reports/hapcut.qmd

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

harpy/reports/align_bxstats.qmd (1)
272-337: Enhanced visualization with new sections and plots.

The new sections for "Reads per molecule" and "Linked-read library performance" provide valuable insights with well-implemented visualizations. The plots include appropriate tooltips, titles, and export options.

Consider adding a brief description of what constitutes "good" performance metrics to help users interpret the results.
 This scatterplot is a diagnostic that shows the relationship between the proportion of nonsingleton
 reads (reads with linked information) compared to total reads. Each point is a sample and is colored by the
 mean number of reads per molecule for that sample.
+
+Generally, good performance is indicated by:
+- A high proportion of reads with linked-read information (>0.7)
+- Consistent mean reads per molecule across samples (typically 8-12 reads)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2055ece and a546e85.

📒 Files selected for processing (6)

harpy/reports/align_bxstats.qmd (4 hunks)
harpy/reports/align_stats.qmd (5 hunks)
harpy/reports/bcftools_stats.qmd (2 hunks)
harpy/reports/bx_count.qmd (4 hunks)
harpy/reports/hapcut.qmd (4 hunks)
harpy/reports/impute.qmd (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (5)

harpy/reports/impute.qmd
harpy/reports/hapcut.qmd
harpy/reports/align_stats.qmd
harpy/reports/bx_count.qmd
harpy/reports/bcftools_stats.qmd

⏰ Context from checks skipped due to timeout of 90000ms (18)

GitHub Check: align strobe
GitHub Check: simulate variants
GitHub Check: simulate linkedreads
GitHub Check: qc
GitHub Check: preflight
GitHub Check: phase
GitHub Check: sv naibr
GitHub Check: mpileup
GitHub Check: sv leviathan
GitHub Check: impute
GitHub Check: demux gen1
GitHub Check: freebayes
GitHub Check: harpy extras
GitHub Check: align EMA
GitHub Check: metassembly
GitHub Check: deconvolve
GitHub Check: align BWA
GitHub Check: Rebuild Container

🔇 Additional comments (4)

harpy/reports/align_bxstats.qmd (4)

28-35: LGTM! Well-implemented utility function.

The avg_percent_of function is a clean implementation that correctly handles zero values and provides accurate percentage calculations.

110-172: Great addition of value boxes for key metrics!

The value boxes provide an excellent high-level overview of important statistics. The implementation correctly handles edge cases by filtering out zero values.

177-206: Excellent enhancement with tooltips for column descriptions!

The header tooltips significantly improve user experience by providing clear descriptions for each column. The implementation is clean and maintainable.

225-227: Improved figure height calculation aligns with PR objectives.

The change from 1 + (0.4 * n_samples) to 3 + (0.4 * n_samples) provides better vertical space for per-sample plots, which directly addresses one of the PR objectives.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

harpy/_cli_types_params.py (2)

193-205: Consider simplifying NAIBR parameter validation.

The current implementation uses index-based validation for argument-value pairs, which can be fragile. Consider using a more robust approach by collecting pairs first.

Here's a suggested improvement:

-        for idx,i in enumerate(shellsplit(value)):
-            if i.startswith("-"):
-                self.fail(f"{i} begins with a dash, which is the wrong format. Try using " + i.lstrip("-") + " VAL instead", param, ctx)
-            # if it's an even index, it's the argument name of an arg-val pair
-            if idx % 2 == 0:
-                opts += 1
-                if i in harpy_options:
-                    self.fail(f"{i} is already used by Harpy when calling naibr.", param, ctx)
-                if i not in valid_options:
-                    self.fail(f"{i} is not a valid naibr option. See the naibr documentation for a list of available options: {docs}.", param, ctx)
+        args = shellsplit(value)
+        if len(args) % 2 != 0:
+            self.fail("NAIBR options must be provided in pairs of ARG VAL", param, ctx)
+        
+        for arg, _ in zip(args[::2], args[1::2]):
+            if arg.startswith("-"):
+                self.fail(f"{arg} begins with a dash, which is the wrong format. Try using {arg.lstrip('-')} VAL instead", param, ctx)
+            
+            opts += 1
+            if arg in harpy_options:
+                self.fail(f"{arg} is already used by Harpy when calling naibr.", param, ctx)
+            if arg not in valid_options:
+                self.fail(f"{arg} is not a valid naibr option. See the naibr documentation for a list of available options: {docs}.", param, ctx)

245-249: Remove unnecessary blank lines at the end of file.

There are 5 unnecessary blank lines at the end of the file.

-
-
-
-
-

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a546e85 and 6871686.

📒 Files selected for processing (1)

harpy/_cli_types_params.py (13 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (18)

GitHub Check: align strobe
GitHub Check: simulate variants
GitHub Check: simulate linkedreads
GitHub Check: qc
GitHub Check: preflight
GitHub Check: phase
GitHub Check: sv naibr
GitHub Check: mpileup
GitHub Check: sv leviathan
GitHub Check: impute
GitHub Check: freebayes
GitHub Check: harpy extras
GitHub Check: align EMA
GitHub Check: demux gen1
GitHub Check: metassembly
GitHub Check: align BWA
GitHub Check: deconvolve
GitHub Check: Rebuild Container

🔇 Additional comments (1)

harpy/_cli_types_params.py (1)

14-14: Great improvement in command-line argument parsing!

The switch from value.split() to shellsplit across all parameter validation classes is a significant improvement that:

Properly handles quoted strings containing spaces

Correctly processes escaped characters

Follows shell-like syntax rules

Prevents argument injection vulnerabilities

This change makes the parameter parsing more robust and secure.

Also applies to: 33-33, 52-52, 71-71, 92-92, 111-111, 133-133, 155-155, 174-174, 193-193, 215-215, 234-234

harpy/_cli_types_params.py

…jection

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

harpy/snakefiles/impute.smk (1)

29-39: LGTM! Good parameter management.

The introduction of a static dictionary for STITCH arguments improves code organization and maintainability. Consider adding a comment explaining the significance of the default values, particularly for niterations=40 and switchModelIteration=39.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6871686 and 666d188.

📒 Files selected for processing (4)

harpy/_cli_types_params.py (21 hunks)
harpy/impute.py (1 hunks)
harpy/scripts/stitch_impute.R (0 hunks)
harpy/snakefiles/impute.smk (5 hunks)

💤 Files with no reviewable changes (1)

harpy/scripts/stitch_impute.R

⏰ Context from checks skipped due to timeout of 90000ms (18)

GitHub Check: align strobe
GitHub Check: simulate variants
GitHub Check: simulate linkedreads
GitHub Check: qc
GitHub Check: preflight
GitHub Check: phase
GitHub Check: mpileup
GitHub Check: sv naibr
GitHub Check: sv leviathan
GitHub Check: impute
GitHub Check: harpy extras
GitHub Check: freebayes
GitHub Check: align EMA
GitHub Check: demux gen1
GitHub Check: metassembly
GitHub Check: Rebuild Container
GitHub Check: align BWA
GitHub Check: deconvolve

🔇 Additional comments (7)

harpy/impute.py (1)

54-54: LGTM! Clear and helpful docstring update.

The docstring now provides clear guidance on the required format for STITCH parameters, which aligns with the parameter validation in StitchParams class.

harpy/snakefiles/impute.smk (3)

91-94: LGTM! Well-structured rule updates.

The changes improve the rule by:

Simplifying input handling

Properly managing temporary directories

Using the centralized parameter management system

Also applies to: 100-101, 105-115

142-152: LGTM! Improved report configuration handling.

The changes enhance the report generation system by:

Supporting both YAML and SCSS files

Using Python's built-in file operations instead of shell commands

Aligning with the PR's objective to enhance report generation

248-249: LGTM! Consistent report handling.

The changes maintain consistency with the updated report generation system introduced in the report_config rule.

Also applies to: 254-255

harpy/_cli_types_params.py (3)

4-5: LGTM! Import statements fixed.

The import statements have been correctly updated to use from shlex import syntax.

133-154: LGTM! Enhanced parameter validation.

The changes improve the StitchParams class by:

Enforcing stricter parameter format validation

Providing more descriptive error messages

Maintaining consistency with the docstring in impute.py

15-24: LGTM! Consistent parameter handling improvements.

The changes enhance all parameter classes by:

Using shellsplit for proper command-line argument parsing

Using shellquote for proper shell escaping

Maintaining consistency across all classes

Also applies to: 34-43, 53-62, 72-81, 93-102, 112-127, 137-152, 164-173, 183-192, 202-214, 224-233, 243-252

pdimens added 4 commits February 13, 2025 11:42

make rule pull report configs from repo

cfec301

new model for report formatting

4b6c3fc

adjust base height for better spacing

442d243

rm deprecated comments

6018610

Merge remote-tracking branch 'origin/main' into refine_quarto

31b331c

coderabbitai bot reviewed Feb 13, 2025

View reviewed changes

harpy/_misc.py Show resolved Hide resolved

harpy/_misc.py Show resolved Hide resolved

correct filenames

35437ea

coderabbitai bot reviewed Feb 13, 2025

View reviewed changes

pdimens added 5 commits February 13, 2025 13:50

rm dep on pkgbuild for testing

19ea4c4

fix needs

2560c1f

add hover tooltip

6bb8c09

add tooltips to tables

57db855

remove filtering table, add filtering plot

2055ece

coderabbitai bot reviewed Feb 13, 2025

View reviewed changes

harpy/reports/hapcut.qmd Show resolved Hide resolved

harpy/reports/hapcut.qmd Show resolved Hide resolved

pdimens added 3 commits February 13, 2025 17:07

add tooltips to all columns, fix typos

f8e5a15

fix

4f4b86d

imrpove styling, add markers back

a546e85

coderabbitai bot reviewed Feb 13, 2025

View reviewed changes

lean on shlex.split() for more rigorous extra param splitting

6871686

coderabbitai bot reviewed Feb 13, 2025

View reviewed changes

harpy/_cli_types_params.py Outdated Show resolved Hide resolved

pdimens added 3 commits February 14, 2025 09:59

return shell-quoted params as a security measure against malicious in…

5463589

…jection

more bulletproofing

a3374cf

overhaul impute to use STITCH CLI

666d188

coderabbitai bot reviewed Feb 14, 2025

View reviewed changes

pdimens added 5 commits February 14, 2025 11:29

fix quoting

fd811a6

skip clearing space since we are skipping containers

ae3d134

restore clearing space for containerization

d7ca349

fix silly silly typo

02343dd

Add routines to reports for single-sample runs #205

1a9940b

improve dedup text, formally cut it off

285b35d

pdimens merged commit 9b923ad into main Feb 14, 2025
19 checks passed

pdimens deleted the refine_quarto branch February 14, 2025 17:17

Conversation

pdimens commented Feb 13, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pdimens commented Feb 13, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 13, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)