Skip to content

Refine quarto#204

Merged
pdimens merged 24 commits intomainfrom
refine_quarto
Feb 14, 2025
Merged

Refine quarto#204
pdimens merged 24 commits intomainfrom
refine_quarto

Conversation

@pdimens
Copy link
Copy Markdown
Owner

@pdimens pdimens commented Feb 13, 2025

This PR:

  • uses a new model to pull the quarto configs from github, ensuring reports use the latest design regardless of harpy version (after this release)
  • addressess base figure height for per-sample plots to give them moer vertical space
  • restores the nicer lumen theme that was used when reports were made with flexdashboard. It's not a 1:1, but it's close and the favicon is back.

Summary by CodeRabbit

  • New Features

    • Introduced dynamic configuration file retrieval with online sourcing and local fallback, providing clearer error messages.
    • Integrated additional styling resources into reports for flexible theming.
    • Added error handling for input data files to prevent processing with empty datasets.
    • Enhanced reporting capabilities by incorporating new input files and modifying output structures for better management.
    • Improved user interaction with tooltips in data tables for enhanced clarity.
    • Added new metrics and visualizations in reports, enhancing data presentation.
  • Refactor

    • Streamlined report generation workflows with unified inputs and outputs across pipelines.
    • Removed outdated functionality to simplify processing.
  • Style

    • Adjusted figure height calculations across reports for enhanced visual layout and improved readability.
    • Updated datatable options for better layout management.
    • Added fillContainer option to datatables for improved layout.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 13, 2025

Warning

Rate limit exceeded

@pdimens has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 4 minutes and 13 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between fd811a6 and 285b35d.

📒 Files selected for processing (6)
  • .github/workflows/tests.yml (35 hunks)
  • harpy/_cli_types_params.py (22 hunks)
  • harpy/qc.py (1 hunks)
  • harpy/reports/bcftools_stats.qmd (4 hunks)
  • harpy/reports/hapcut.qmd (5 hunks)
  • harpy/snakefiles/qc.smk (2 hunks)
📝 Walkthrough

Walkthrough

The changes implement a new configuration file retrieval mechanism in the Harpy application. The logic now attempts to download configuration files (both YAML and SCSS) from a specified GitHub URL and, if that fails, falls back to local file retrieval with improved error messaging. In parallel, several report files have been updated to adjust figure dimensions, remove redundant default handling, and parameterize file inputs. Additionally, numerous Snakemake workflows have been refactored to accept both YAML and SCSS inputs—with shell commands replaced by Python run blocks using shutil.copy—and deprecated indexing functionalities have been removed.

Changes

File(s) Change Summary
harpy/_misc.py
harpy/reports/_quarto.yml
Updated configuration fetching: Added a try-except mechanism to first download _quarto.yml and _harpy.scss from GitHub with a fallback to local retrieval; updated theme stylesheet reference from harpy.scss to _harpy.scss.
harpy/reports/align_bxstats.qmd
harpy/reports/align_stats.qmd
harpy/reports/bcftools_stats.qmd
harpy/reports/bx_count.qmd
harpy/reports/impute.qmd
harpy/reports/leviathan_pop.qmd
harpy/reports/preflight_fastq.qmd
Modified report documents: Adjusted figure height calculations (increasing the base height value in multiple files), removed now-unwanted NA/NaN default handling, parameterized file inputs, and introduced error handling for empty or missing data files.
All harpy/snakefiles/*.smk (e.g., align_bwa.smk, align_ema.smk, align_strobealign.smk, impute.smk, phase.smk, preflight_bam.smk, preflight_fastq.smk, qc.smk, snp_freebayes.smk, snp_mpileup.smk, sv_leviathan.smk, sv_leviathan_pop.smk, sv_naibr*.smk) Updated Snakemake workflows: Modified report-related rules to accept two inputs (YAML and SCSS), updated output declarations to use temporary files, replaced shell commands with Python run blocks utilizing shutil.copy, removed deprecated BAM indexing functions and rules, and streamlined input parameter handling across multiple rules.

Sequence Diagram(s)

sequenceDiagram
    participant F as Fetch_Report Function
    participant U as GitHub URL
    participant L as Local File System
    participant E as Error Handler

    F->>U: Attempt to download _quarto.yml / _harpy.scss
    alt Download Successful
        U-->>F: Return configuration file
    else Download Fails
        F->>L: Retrieve file from local installation
        L-->>F: Return local configuration file
        F->>E: Log detailed error message (if needed)
    end
    F->>F: Proceed with report generation using retrieved files
Loading
sequenceDiagram
    participant R as Report_Config Rule
    participant I as Input Files (YAML & SCSS)
    participant S as Python Run Block (shutil.copy)
    participant O as Temporary Output Files

    R->>I: Receive YAML & SCSS inputs
    R->>S: Invoke run block to copy files
    S->>O: Generate temporary YAML and SCSS files for workflow
    O-->>R: Output available for downstream reporting rules
Loading

Possibly related PRs


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (2)
harpy/snakefiles/preflight_bam.smk (1)

12-24: 🛠️ Refactor suggestion

Consider revising the discount and fee structure.

The current implementation has a potential business logic issue: adding a flat $20 fee to discounted bills can negate the discount benefits, especially for smaller purchases. For example:

  • $100 purchase with 10% discount = $90
  • After $20 fee = $110 (more expensive than without discount)

This could lead to customer dissatisfaction and defeat the purpose of the loyalty program.

Consider one of these alternatives:

  1. Apply fee before discount
  2. Scale fee based on purchase amount
  3. Remove fee for loyal customers

Would you like me to propose a revised implementation that better aligns with customer loyalty incentives?

harpy/snakefiles/align_bwa.smk (1)

12-24: 🛠️ Refactor suggestion

Revise billing logic to prevent discount negation.

The current implementation adds a flat $20 fee after applying the discount, which can result in a higher final price than the original amount for small purchases. For example:

  • $100 purchase with 10% discount = $90 + $20 fee = $110 (more than original price)

Consider these alternatives:

  1. Apply fee before discount
  2. Scale fee based on purchase amount
  3. Set minimum purchase amount for fee application

Would you like me to provide an implementation for any of these approaches?

🧹 Nitpick comments (1)
harpy/snakefiles/align_strobealign.smk (1)

201-208: Consider using absolute paths for QMD files.

While the changes correctly integrate the new configuration system, using relative paths for QMD files could cause issues if the workflow is called from a different directory.

-        qmd = f"{outdir}/workflow/report/align_stats.qmd"
+        qmd = os.path.join(os.getcwd(), outdir, "workflow", "report", "align_stats.qmd")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8e65aa1 and 31b331c.

📒 Files selected for processing (25)
  • harpy/_misc.py (2 hunks)
  • harpy/reports/_quarto.yml (1 hunks)
  • harpy/reports/align_bxstats.qmd (1 hunks)
  • harpy/reports/align_stats.qmd (1 hunks)
  • harpy/reports/bcftools_stats.qmd (1 hunks)
  • harpy/reports/bx_count.qmd (1 hunks)
  • harpy/reports/impute.qmd (1 hunks)
  • harpy/reports/leviathan_pop.qmd (0 hunks)
  • harpy/reports/preflight_fastq.qmd (0 hunks)
  • harpy/snakefiles/align_bwa.smk (2 hunks)
  • harpy/snakefiles/align_ema.smk (2 hunks)
  • harpy/snakefiles/align_strobealign.smk (2 hunks)
  • harpy/snakefiles/impute.smk (2 hunks)
  • harpy/snakefiles/phase.smk (1 hunks)
  • harpy/snakefiles/preflight_bam.smk (1 hunks)
  • harpy/snakefiles/preflight_fastq.smk (1 hunks)
  • harpy/snakefiles/qc.smk (1 hunks)
  • harpy/snakefiles/snp_freebayes.smk (1 hunks)
  • harpy/snakefiles/snp_mpileup.smk (1 hunks)
  • harpy/snakefiles/sv_leviathan.smk (1 hunks)
  • harpy/snakefiles/sv_leviathan_pop.smk (2 hunks)
  • harpy/snakefiles/sv_naibr.smk (1 hunks)
  • harpy/snakefiles/sv_naibr_phase.smk (1 hunks)
  • harpy/snakefiles/sv_naibr_pop.smk (2 hunks)
  • harpy/snakefiles/sv_naibr_pop_phase.smk (2 hunks)
💤 Files with no reviewable changes (2)
  • harpy/reports/leviathan_pop.qmd
  • harpy/reports/preflight_fastq.qmd
✅ Files skipped from review due to trivial changes (2)
  • harpy/reports/_quarto.yml
  • harpy/reports/bx_count.qmd
🧰 Additional context used
🪛 Ruff (0.8.2)
harpy/_misc.py

94-94: Do not use bare except

(E722)


99-99: f-string without any placeholders

Remove extraneous f prefix

(F541)


108-108: Do not use bare except

(E722)


113-113: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (38)
harpy/snakefiles/preflight_bam.smk (1)

52-62: LGTM! Clean implementation of the report configuration rule.

The changes properly handle both YAML and SCSS files, and correctly mark outputs as temporary.

harpy/snakefiles/snp_freebayes.smk (2)

195-206: LGTM! Clean implementation of SCSS file handling.

The changes to report_config rule properly handle both YAML and SCSS files, making them temporary outputs and using Python's shutil for efficient file operations.


207-262: LGTM! Report generation properly integrates SCSS styling.

The variant_report rule correctly incorporates the SCSS file while maintaining existing functionality.

harpy/snakefiles/sv_leviathan.smk (2)

186-209: LGTM! Consistent implementation of report configuration.

The changes to report_config rule maintain consistency with other workflow files.


198-233: LGTM! Sample reports properly integrate styling.

The sample_reports rule correctly incorporates the SCSS file while maintaining existing functionality.

harpy/snakefiles/sv_naibr.smk (2)

198-209: LGTM! Consistent implementation of report configuration.

The changes to report_config rule maintain consistency with other workflow files.


210-233: LGTM! Sample reports properly integrate styling.

The sample_reports rule correctly incorporates the SCSS file while maintaining existing functionality.

harpy/snakefiles/snp_mpileup.smk (2)

230-241: LGTM! Consistent implementation of report configuration.

The changes to report_config rule maintain consistency with other workflow files.


242-262: LGTM! Report generation properly integrates styling.

The variant_report rule correctly incorporates the SCSS file while maintaining existing functionality.

harpy/reports/impute.qmd (1)

336-336: LGTM! Improved figure height calculation.

The increased base height from 1 to 3 provides better vertical space for per-sample plots, enhancing visibility and readability.

harpy/snakefiles/phase.smk (1)

258-266: LGTM! Improved configuration file handling.

The changes enhance the report configuration by:

  • Explicitly handling both YAML and SCSS files
  • Using Python's shutil.copy for more reliable file operations
  • Efficiently handling multiple files with zip iteration
harpy/snakefiles/sv_naibr_phase.smk (1)

258-266: LGTM! Consistent configuration file handling.

The changes maintain consistency with other workflow files by using the same robust approach for handling YAML and SCSS configurations.

harpy/snakefiles/sv_naibr_pop.smk (1)

231-239: LGTM! Consistent configuration handling across codebase.

The changes maintain the standardized approach for handling YAML and SCSS configurations, ensuring consistency throughout the entire codebase.

harpy/reports/bcftools_stats.qmd (2)

23-23: LGTM! Improved flexibility with parameterized input.

The change from hardcoded path to using params$infile makes the code more configurable and reusable.


340-340: LGTM! Enhanced plot visibility with increased base height.

The change from 1 + (0.6 * nrow(psc)) to 3 + (0.6 * nrow(psc)) provides more vertical space for per-sample plots while maintaining proportional scaling with the number of samples.

harpy/reports/align_bxstats.qmd (1)

131-131: LGTM! Consistent enhancement of plot visibility.

The change from 1 + (0.4 * n_samples) to 3 + (0.4 * n_samples) aligns with similar changes in other files, providing better visualization while maintaining proportional scaling.

harpy/snakefiles/sv_naibr_pop_phase.smk (3)

316-324: LGTM! Enhanced report configuration with improved file handling.

The changes improve the report configuration by:

  1. Adding SCSS styling support
  2. Using Python's shutil.copy for robust file handling
  3. Properly marking outputs as temporary for cleanup

328-330: LGTM! Consistent styling support for sample reports.

Added SCSS file to inputs, maintaining consistency with the report configuration changes.


353-354: LGTM! Consistent styling support for aggregate reports.

Added SCSS file to inputs, maintaining consistency with other report-related changes.

harpy/snakefiles/align_ema.smk (3)

291-299: LGTM! Consistent report configuration implementation.

The changes mirror those in other Snakemake files, maintaining consistency in:

  1. SCSS styling support
  2. Python-based file handling
  3. Temporary output management

303-304: LGTM! Consistent styling support in sample reports.

Added SCSS file to inputs, maintaining consistency across the codebase.


362-363: LGTM! Consistent styling support in barcode reports.

Added SCSS file to inputs, maintaining consistency with other report-related changes.

harpy/reports/align_stats.qmd (4)

34-37: LGTM! Robust error handling for empty input files.

The added error handling prevents report generation when input files are empty, with clear error messages. This improves the robustness of the reporting process.

Also applies to: 379-382, 384-387


1-700: LGTM! Well-structured and organized code.

The code is logically organized into clear sections with descriptive headers and comments, making it maintainable and easy to follow.


203-259: LGTM! Well-implemented data processing and visualization.

The data processing and visualization code is well-implemented with proper error handling, clear labels, and interactive features. The highcharter configurations are comprehensive and user-friendly.

Also applies to: 272-295, 307-329, 337-359, 478-498


666-697: LGTM! Comprehensive and clear documentation.

The documentation thoroughly explains technical terms, file formats, and data interpretations, making the report accessible to users.

harpy/snakefiles/preflight_fastq.smk (2)

67-77: LGTM! Enhanced report configuration handling.

The changes improve the rule by adding SCSS file support and using Python's shutil.copy for more reliable file operations. Making outputs temporary helps with cleanup.


79-97: LGTM! Added SCSS file support.

The rule now correctly includes the SCSS file in its inputs while maintaining existing functionality.

harpy/snakefiles/qc.smk (2)

150-161: LGTM! Enhanced report configuration handling.

The changes improve the rule by adding SCSS file support and using Python's shutil.copy for more reliable file operations. Making outputs temporary helps with cleanup.


162-182: LGTM! Added SCSS file support.

The rule now correctly includes the SCSS file in its inputs while maintaining existing functionality.

harpy/snakefiles/impute.smk (3)

123-134: LGTM! Enhanced report configuration handling.

The changes improve the rule by adding SCSS file support and using Python's shutil.copy for more reliable file operations. Making outputs temporary helps with cleanup.


135-164: LGTM! Added SCSS file support.

The rule now correctly includes the SCSS file in its inputs while maintaining existing functionality.


227-256: LGTM! Added SCSS file support.

The rule now correctly includes the SCSS file in its inputs while maintaining existing functionality.

harpy/snakefiles/sv_leviathan_pop.smk (3)

241-252: LGTM! Enhanced report configuration handling.

The changes improve the rule by adding SCSS file support and using Python's shutil.copy for more reliable file operations. Making outputs temporary helps with cleanup.


253-276: LGTM! Added SCSS file support.

The rule now correctly includes the SCSS file in its inputs while maintaining existing functionality.


278-301: LGTM! Added SCSS file support.

The rule now correctly includes the SCSS file in its inputs while maintaining existing functionality.

harpy/snakefiles/align_strobealign.smk (2)

188-199: LGTM! Clean refactor of report configuration handling.

The changes improve modularity by separating YAML and SCSS configurations, and the use of Python's shutil.copy is more robust than shell commands.


274-280: Use absolute paths for QMD files here as well.

Similar to the sample_reports rule, consider using absolute paths for QMD files to ensure robustness.

-        qmd = f"{outdir}/workflow/report/align_bxstats.qmd"
+        qmd = os.path.join(os.getcwd(), outdir, "workflow", "report", "align_bxstats.qmd")

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
harpy/snakefiles/phase.smk (1)

268-288: LGTM! Consider using Python's shutil.copy for consistency.

The changes correctly integrate the SCSS styling and handle QMD file processing. For consistency with the report_config rule, consider using Python's shutil.copy instead of the shell cp command.

-        cp {input.qmd} {output.qmd}
+    run:
+        import shutil
+        shutil.copy(input.qmd, output.qmd)
         INFILE=$(realpath {input.data})
         quarto render {output.qmd} --log {log} --quiet -P blockfile:$INFILE
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 31b331c and 35437ea.

📒 Files selected for processing (2)
  • harpy/_misc.py (2 hunks)
  • harpy/snakefiles/phase.smk (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
harpy/_misc.py

94-94: Do not use bare except

(E722)


99-99: f-string without any placeholders

Remove extraneous f prefix

(F541)


108-108: Do not use bare except

(E722)


113-113: f-string without any placeholders

Remove extraneous f prefix

(F541)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Build/Cache Container
🔇 Additional comments (3)
harpy/snakefiles/phase.smk (1)

256-267: LGTM! Clean implementation of the configuration file handling.

The changes effectively handle both YAML and SCSS files using a concise Python implementation. The use of temporary files and shutil.copy is appropriate for this task.

harpy/_misc.py (2)

8-8: LGTM!

The urllib.request import is correctly placed and necessary for the new GitHub file retrieval functionality.


89-115: Previous review comments are still applicable.

The suggested improvements from previous reviews regarding error handling, code duplication, and error message consistency are still valid.

Add file operation safeguards.

The current implementation could lead to race conditions or partial writes if there are network interruptions during file downloads. Consider using temporary files and atomic moves for safer file operations.

 def fetch_config_file(file_name, github_path, workdir):
     try:
         destination = f"{workdir}/report/{file_name}"
+        temp_destination = f"{destination}.tmp"
         with urllib.request.urlopen(github_path) as response, \
-             open(destination, 'w') as out_file:
+             open(temp_destination, 'w') as out_file:
             out_file.write(response.read().decode("utf-8"))
+        os.replace(temp_destination, destination)
     except Exception as e:
         if os.path.exists(temp_destination):
             os.unlink(temp_destination)
🧰 Tools
🪛 Ruff (0.8.2)

94-94: Do not use bare except

(E722)


99-99: f-string without any placeholders

Remove extraneous f prefix

(F541)


108-108: Do not use bare except

(E722)


113-113: f-string without any placeholders

Remove extraneous f prefix

(F541)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (6)
harpy/reports/leviathan.qmd (2)

125-137: LGTM! The datatable configuration enhances report appearance.

The addition of fillContainer = T improves the table's responsiveness and visual presentation.

Consider using TRUE instead of T for better code readability and consistency with R best practices:

-  fillContainer = T,
+  fillContainer = TRUE,

148-160: LGTM! Consistent datatable configuration applied.

The same responsive layout enhancement is consistently applied to the variants per contig table.

Similarly, consider using TRUE instead of T:

-  fillContainer = T,
+  fillContainer = TRUE,
harpy/reports/align_bxstats.qmd (1)

92-93: Consider using tidyr::replace_na for better readability.

The current NA handling could be more idiomatic using tidyr's functions.

-aggregate_df[is.na(aggregate_df)] <- 0
+aggregate_df <- aggregate_df %>% replace_na(list(
+  totalreads = 0,
+  totaluniquemol = 0,
+  # ... other columns ...
+))
.github/workflows/tests.yml (1)

106-119: Consider cleaning up commented singularity artifact sections.

Multiple jobs contain identical commented-out sections for downloading singularity artifacts. If these sections are no longer needed, consider removing them to improve code maintainability. If they're planned for future use, consider adding a TODO comment explaining the plan.

-#      - name: Download Singularity Artifact
-#        uses: actions/download-artifact@v4
-#        with:
-#          name: deps-image
-#          path: .snakemake/singularity

Also applies to: 147-151, 185-189, 227-231, 267-271, 304-308, 342-346, 379-384, 418-422, 459-463, 500-504, 542-546, 589-594, 633-637, 684-689, 743-747, 783-787

harpy/reports/hapcut.qmd (2)

263-280: Reduce code duplication by extracting the headerCallback function.

The headerCallback function implementation is duplicated between per-contig and per-sample tables. Consider extracting it into a reusable function.

Create a helper function at the beginning of the file:

create_header_callback <- function(column_description) {
  c(
    "function(thead, data, start, end, display){",
    paste0(" var tooltips = ['", paste(column_description, collapse = "','"), "'];"),
    "  for(var i=0; i<tooltips.length; i++){",
    "    $('th:eq('+i+')',thead).attr('title', tooltips[i]);",
    "  }",
    "}"
  )
}

Then use it for both tables:

- headerCallback <- c(
-   "function(thead, data, start, end, display){",
-   paste0(" var tooltips = ['", paste(column_description, collapse = "','"), "'];"),
-   "  for(var i=0; i<5; i++){",
-   "    $('th:eq('+i+')',thead).attr('title', tooltips[i]);",
-   "  }",
-   "}"
- )
+ headerCallback <- create_header_callback(column_description)

Also applies to: 322-339


263-271: Standardize tooltip descriptions between tables.

The descriptions for similar columns in per-contig and per-sample tables have slight variations. Consider standardizing them for consistency.

For example:

  • Per-contig: "total haplotypes on the contig"
  • Per-sample: "number of haplotypes"

Consider using consistent phrasing across both tables.

Also applies to: 322-330

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 19ea4c4 and 2055ece.

📒 Files selected for processing (11)
  • .github/workflows/tests.yml (18 hunks)
  • harpy/reports/align_bxstats.qmd (4 hunks)
  • harpy/reports/align_stats.qmd (5 hunks)
  • harpy/reports/bcftools_stats.qmd (2 hunks)
  • harpy/reports/bx_count.qmd (4 hunks)
  • harpy/reports/hapcut.qmd (4 hunks)
  • harpy/reports/impute.qmd (3 hunks)
  • harpy/reports/leviathan.qmd (2 hunks)
  • harpy/reports/leviathan_pop.qmd (3 hunks)
  • harpy/reports/preflight_bam.qmd (1 hunks)
  • harpy/reports/preflight_fastq.qmd (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • harpy/reports/preflight_bam.qmd
🚧 Files skipped from review as they are similar to previous changes (6)
  • harpy/reports/impute.qmd
  • harpy/reports/leviathan_pop.qmd
  • harpy/reports/bx_count.qmd
  • harpy/reports/preflight_fastq.qmd
  • harpy/reports/align_stats.qmd
  • harpy/reports/bcftools_stats.qmd
⏰ Context from checks skipped due to timeout of 90000ms (18)
  • GitHub Check: align strobe
  • GitHub Check: simulate variants
  • GitHub Check: simulate linkedreads
  • GitHub Check: qc
  • GitHub Check: preflight
  • GitHub Check: phase
  • GitHub Check: sv naibr
  • GitHub Check: mpileup
  • GitHub Check: sv leviathan
  • GitHub Check: impute
  • GitHub Check: freebayes
  • GitHub Check: harpy extras
  • GitHub Check: align EMA
  • GitHub Check: demux gen1
  • GitHub Check: metassembly
  • GitHub Check: deconvolve
  • GitHub Check: Rebuild Container
  • GitHub Check: align BWA
🔇 Additional comments (8)
harpy/reports/align_bxstats.qmd (5)

28-35: LGTM! Well-implemented utility function.

The avg_percent_of function is a clean implementation that correctly handles zero values and provides precise percentage calculations.


110-171: LGTM! Value boxes enhance report readability.

The addition of value boxes provides a clear, at-a-glance summary of key metrics. The implementation correctly handles edge cases by filtering out zero values when calculating averages.


176-205: LGTM! Excellent documentation with tooltips.

The implementation of column descriptions and tooltips significantly improves the user experience by providing clear explanations for each metric.


224-226: LGTM! Figure height adjustment aligns with PR objectives.

The change from 1 + (0.4 * n_samples) to 3 + (0.4 * n_samples) provides more vertical space for per-sample plots, improving visibility as intended in the PR objectives.


271-336: LGTM! Informative new visualizations.

The addition of "Reads per molecule" distribution and "Linked-read library performance" scatter plot provides valuable insights into library quality and sequencing performance. The implementation includes proper error handling for cases with insufficient unique values.

.github/workflows/tests.yml (2)

58-60: LGTM! Job name and description changes are clear and descriptive.

The renaming from pkgbuild to containerize better reflects the job's purpose, and the description update from "Build/Cache Container" to "Rebuild Container" is more precise.


121-800:

Details

✅ Verification successful

Verify the impact of removing container build dependencies.

The removal of the pkgbuild (now containerize) dependency from multiple jobs could improve workflow execution time through increased parallelization. However, please verify that these jobs can safely run without waiting for the container to be built.

Run this script to check if any of these jobs use the container:


🏁 Script executed:

#!/bin/bash
# Description: Check for container usage in job definitions

# Search for container references in job steps
rg -A 5 'container:|image:|uses: docker' 

# Search for singularity/apptainer references
rg -A 5 'singularity|apptainer'

Length of output: 60293


Container Dependency Verification Confirmed – Dependency Removal is Safe

Our analysis indicates that none of the jobs in the tests workflow require an actively built container. In the repository’s Snakefiles and harpy modules, container usage is consistently set to “None” (or conditionally bypassed via “conda” vs. “conda apptainer” logic), confirming that the former dependency on container building (previously via pkgbuild) is not essential for these jobs. This change should improve parallelization without impacting functionality.

• Verify that no production workflow unexpectedly depends on container builds.
• Confirm that tests executing harpy commands continue to run as expected in your CI environment.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 312-312: trailing spaces

(trailing-spaces)


[error] 429-429: trailing spaces

(trailing-spaces)

harpy/reports/hapcut.qmd (1)

1-386: LGTM! The implementation aligns well with the PR objectives.

The changes successfully enhance the report's functionality by adding informative tooltips to the data tables, improving user experience and data interpretation. The code is well-documented and organized.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
harpy/reports/align_bxstats.qmd (1)

272-337: Enhanced visualization with new sections and plots.

The new sections for "Reads per molecule" and "Linked-read library performance" provide valuable insights with well-implemented visualizations. The plots include appropriate tooltips, titles, and export options.

Consider adding a brief description of what constitutes "good" performance metrics to help users interpret the results.

 This scatterplot is a diagnostic that shows the relationship between the proportion of nonsingleton
 reads (reads with linked information) compared to total reads. Each point is a sample and is colored by the
 mean number of reads per molecule for that sample.
+
+Generally, good performance is indicated by:
+- A high proportion of reads with linked-read information (>0.7)
+- Consistent mean reads per molecule across samples (typically 8-12 reads)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2055ece and a546e85.

📒 Files selected for processing (6)
  • harpy/reports/align_bxstats.qmd (4 hunks)
  • harpy/reports/align_stats.qmd (5 hunks)
  • harpy/reports/bcftools_stats.qmd (2 hunks)
  • harpy/reports/bx_count.qmd (4 hunks)
  • harpy/reports/hapcut.qmd (4 hunks)
  • harpy/reports/impute.qmd (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • harpy/reports/impute.qmd
  • harpy/reports/hapcut.qmd
  • harpy/reports/align_stats.qmd
  • harpy/reports/bx_count.qmd
  • harpy/reports/bcftools_stats.qmd
⏰ Context from checks skipped due to timeout of 90000ms (18)
  • GitHub Check: align strobe
  • GitHub Check: simulate variants
  • GitHub Check: simulate linkedreads
  • GitHub Check: qc
  • GitHub Check: preflight
  • GitHub Check: phase
  • GitHub Check: sv naibr
  • GitHub Check: mpileup
  • GitHub Check: sv leviathan
  • GitHub Check: impute
  • GitHub Check: demux gen1
  • GitHub Check: freebayes
  • GitHub Check: harpy extras
  • GitHub Check: align EMA
  • GitHub Check: metassembly
  • GitHub Check: deconvolve
  • GitHub Check: align BWA
  • GitHub Check: Rebuild Container
🔇 Additional comments (4)
harpy/reports/align_bxstats.qmd (4)

28-35: LGTM! Well-implemented utility function.

The avg_percent_of function is a clean implementation that correctly handles zero values and provides accurate percentage calculations.


110-172: Great addition of value boxes for key metrics!

The value boxes provide an excellent high-level overview of important statistics. The implementation correctly handles edge cases by filtering out zero values.


177-206: Excellent enhancement with tooltips for column descriptions!

The header tooltips significantly improve user experience by providing clear descriptions for each column. The implementation is clean and maintainable.


225-227: Improved figure height calculation aligns with PR objectives.

The change from 1 + (0.4 * n_samples) to 3 + (0.4 * n_samples) provides better vertical space for per-sample plots, which directly addresses one of the PR objectives.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
harpy/_cli_types_params.py (2)

193-205: Consider simplifying NAIBR parameter validation.

The current implementation uses index-based validation for argument-value pairs, which can be fragile. Consider using a more robust approach by collecting pairs first.

Here's a suggested improvement:

-        for idx,i in enumerate(shellsplit(value)):
-            if i.startswith("-"):
-                self.fail(f"{i} begins with a dash, which is the wrong format. Try using " + i.lstrip("-") + " VAL instead", param, ctx)
-            # if it's an even index, it's the argument name of an arg-val pair
-            if idx % 2 == 0:
-                opts += 1
-                if i in harpy_options:
-                    self.fail(f"{i} is already used by Harpy when calling naibr.", param, ctx)
-                if i not in valid_options:
-                    self.fail(f"{i} is not a valid naibr option. See the naibr documentation for a list of available options: {docs}.", param, ctx)
+        args = shellsplit(value)
+        if len(args) % 2 != 0:
+            self.fail("NAIBR options must be provided in pairs of ARG VAL", param, ctx)
+        
+        for arg, _ in zip(args[::2], args[1::2]):
+            if arg.startswith("-"):
+                self.fail(f"{arg} begins with a dash, which is the wrong format. Try using {arg.lstrip('-')} VAL instead", param, ctx)
+            
+            opts += 1
+            if arg in harpy_options:
+                self.fail(f"{arg} is already used by Harpy when calling naibr.", param, ctx)
+            if arg not in valid_options:
+                self.fail(f"{arg} is not a valid naibr option. See the naibr documentation for a list of available options: {docs}.", param, ctx)

245-249: Remove unnecessary blank lines at the end of file.

There are 5 unnecessary blank lines at the end of the file.

-
-
-
-
-
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a546e85 and 6871686.

📒 Files selected for processing (1)
  • harpy/_cli_types_params.py (13 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (18)
  • GitHub Check: align strobe
  • GitHub Check: simulate variants
  • GitHub Check: simulate linkedreads
  • GitHub Check: qc
  • GitHub Check: preflight
  • GitHub Check: phase
  • GitHub Check: sv naibr
  • GitHub Check: mpileup
  • GitHub Check: sv leviathan
  • GitHub Check: impute
  • GitHub Check: freebayes
  • GitHub Check: harpy extras
  • GitHub Check: align EMA
  • GitHub Check: demux gen1
  • GitHub Check: metassembly
  • GitHub Check: align BWA
  • GitHub Check: deconvolve
  • GitHub Check: Rebuild Container
🔇 Additional comments (1)
harpy/_cli_types_params.py (1)

14-14: Great improvement in command-line argument parsing!

The switch from value.split() to shellsplit across all parameter validation classes is a significant improvement that:

  • Properly handles quoted strings containing spaces
  • Correctly processes escaped characters
  • Follows shell-like syntax rules
  • Prevents argument injection vulnerabilities

This change makes the parameter parsing more robust and secure.

Also applies to: 33-33, 52-52, 71-71, 92-92, 111-111, 133-133, 155-155, 174-174, 193-193, 215-215, 234-234

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
harpy/snakefiles/impute.smk (1)

29-39: LGTM! Good parameter management.

The introduction of a static dictionary for STITCH arguments improves code organization and maintainability. Consider adding a comment explaining the significance of the default values, particularly for niterations=40 and switchModelIteration=39.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6871686 and 666d188.

📒 Files selected for processing (4)
  • harpy/_cli_types_params.py (21 hunks)
  • harpy/impute.py (1 hunks)
  • harpy/scripts/stitch_impute.R (0 hunks)
  • harpy/snakefiles/impute.smk (5 hunks)
💤 Files with no reviewable changes (1)
  • harpy/scripts/stitch_impute.R
⏰ Context from checks skipped due to timeout of 90000ms (18)
  • GitHub Check: align strobe
  • GitHub Check: simulate variants
  • GitHub Check: simulate linkedreads
  • GitHub Check: qc
  • GitHub Check: preflight
  • GitHub Check: phase
  • GitHub Check: mpileup
  • GitHub Check: sv naibr
  • GitHub Check: sv leviathan
  • GitHub Check: impute
  • GitHub Check: harpy extras
  • GitHub Check: freebayes
  • GitHub Check: align EMA
  • GitHub Check: demux gen1
  • GitHub Check: metassembly
  • GitHub Check: Rebuild Container
  • GitHub Check: align BWA
  • GitHub Check: deconvolve
🔇 Additional comments (7)
harpy/impute.py (1)

54-54: LGTM! Clear and helpful docstring update.

The docstring now provides clear guidance on the required format for STITCH parameters, which aligns with the parameter validation in StitchParams class.

harpy/snakefiles/impute.smk (3)

91-94: LGTM! Well-structured rule updates.

The changes improve the rule by:

  • Simplifying input handling
  • Properly managing temporary directories
  • Using the centralized parameter management system

Also applies to: 100-101, 105-115


142-152: LGTM! Improved report configuration handling.

The changes enhance the report generation system by:

  • Supporting both YAML and SCSS files
  • Using Python's built-in file operations instead of shell commands
  • Aligning with the PR's objective to enhance report generation

248-249: LGTM! Consistent report handling.

The changes maintain consistency with the updated report generation system introduced in the report_config rule.

Also applies to: 254-255

harpy/_cli_types_params.py (3)

4-5: LGTM! Import statements fixed.

The import statements have been correctly updated to use from shlex import syntax.


133-154: LGTM! Enhanced parameter validation.

The changes improve the StitchParams class by:

  • Enforcing stricter parameter format validation
  • Providing more descriptive error messages
  • Maintaining consistency with the docstring in impute.py

15-24: LGTM! Consistent parameter handling improvements.

The changes enhance all parameter classes by:

  • Using shellsplit for proper command-line argument parsing
  • Using shellquote for proper shell escaping
  • Maintaining consistency across all classes

Also applies to: 34-43, 53-62, 72-81, 93-102, 112-127, 137-152, 164-173, 183-192, 202-214, 224-233, 243-252

@pdimens pdimens merged commit 9b923ad into main Feb 14, 2025
19 checks passed
@pdimens pdimens deleted the refine_quarto branch February 14, 2025 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant