Skip to content

use smaller pixi containers#264

Merged
pdimens merged 33 commits intomainfrom
pixi_container
Oct 22, 2025
Merged

use smaller pixi containers#264
pdimens merged 33 commits intomainfrom
pixi_container

Conversation

@pdimens
Copy link
Copy Markdown
Owner

@pdimens pdimens commented Oct 20, 2025

This PR replaces the one big snakemake-created containerized container in favor of multiple smaller and env_tagged containers that have the software installed using multistaged pixi builds within them.

Summary by CodeRabbit

  • New Features

    • Pipeline steps now run in environment-specific container images; CI builds and publishes per-environment images via a matrix.
    • New diagnostic CLI group with subcommands for dry-run, direct Snakemake execution, and replaying failing rules.
  • Chores

    • CI migrated to a Pixi-based build flow with per-environment manifests and workspace manifest.
    • Workflow initialization updated to carry explicit container context.
  • Removals

    • Removed legacy singleton-separation script and obsolete container-rebuild workflow blocks.
  • Behavior

    • Resume/conda flag removed; container handling relocated to workflow construction and setup flow.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 20, 2025

📝 Walkthrough

Walkthrough

Replaces Snakemake-based container generation with a Pixi-driven per-environment Dockerfile generator and CI matrix builds; adds per-rule container directives across many Snakefiles; changes Workflow API to accept a container flag and removes fetch_snakefile; deletes a singleton-separation script; updates CI, packaging, and report loading behavior.

Changes

Cohort / File(s) Change Summary
CI: release workflow
/.github/workflows/createrelease.yml
Renamed job to build_versioned_containers, added environment matrix, switched to prefix-dev/setup-pixi@v0.9.2, changed build context to container/${{ matrix.env }}, updated image tags to matrix-driven values, and made build_tarball depend on container builds.
CI: tests workflow
/.github/workflows/tests.yml
Removed micromamba-based build/rebuild and Docker push blocks; replaced many Harpy installs with prefix-dev/setup-pixi@v0.9.2, simplified job dependencies to needs: changes, and added a disk-clean step for assembly.
Pixi generator & workspace
harpy/common/create_pixi.py, pixi.toml, /.gitignore
Added create_pixi_dockerfiles() to emit per-environment container/{env} Dockerfiles and pixi manifests (runs pixi init/pixi add), created pixi.toml, and narrowed .gitignore to pixi.lock.
Environments command
harpy/commands/environments.py
containerize() now calls create_pixi_dockerfiles() (removed previous Snakemake/subprocess Dockerfile generation and related imports); updated docstring and workflow constructor invocation.
Workflow core & file ops
harpy/common/workflow.py, harpy/common/file_ops.py
Workflow.__init__ gained a container parameter and public container attribute; setup_snakemake no longer accepts container; removed fetch_snakefile and adjusted snakefile copying/error messages and startup printing.
Conda/env metadata & packaging
harpy/common/conda.py, resources/harpy.yaml, resources/meta.yaml
Simplified simulations deps (keep bioconda::simug>1.0.0), removed separate spades entry and resources/harpy.yaml, bumped rich-click and snakemake-minimal in meta.yaml, and removed several CLI entrypoints.
Command callsites
harpy/commands/*.py (examples)
harpy/commands/align.py, assembly.py, deconvolve.py, demultiplex.py, impute.py, phase.py, qc.py, snp.py, sv.py, validate.py, resume.py, metassembly.py, simulate_variants.py
Updated many command entrypoints to pass container into Workflow constructor and to call setup_snakemake(threads, hpc, snakemake) (removed container arg); resume CLI removed --conda and adapts environment checks/config loading.
Snakefiles — per-rule containerization
harpy/snakefiles/* (examples)
align_bwa.smk, align_strobe.smk, assembly.smk, metassembly.smk, deconvolve.smk, demultiplex_meier2021.smk, phase.smk, qc.smk, simulate_*.smk, snp_*.smk, sv_*.smk, validate_*.smk, environments.smk
Added many per-rule container: "docker://pdimens/harpy:{env}_latest" directives, removed many container: None and top-level containerized defaults, and adjusted some conda env refs (e.g., envs/assembly.yaml).
Removed script
harpy/scripts/separate_singletons.py
Deleted singleton-separation script and its CLI, samtools sorting, and pysam buffering logic.
System ops tweak
harpy/common/system_ops.py
Reordered apptainer check in container_ok to return early when apptainer is present, changing control-flow ordering relative to linux checks.
Reports: QMD/R code blocks
harpy/reports/*.qmd
Replaced dynamic on-demand R package installer/loading helpers with explicit library() calls across many report files; added results = F to some package import chunks.
Printing helper
harpy/common/printing.py
Added print_shellcmd_simple(text) helper to render shell commands in a stylized table with Bash syntax highlighting.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Harpy_CLI
    participant create_pixi
    participant Pixi_CLI
    participant FS

    User->>Harpy_CLI: harpy containerize()
    Harpy_CLI->>create_pixi: create_pixi_dockerfiles()
    create_pixi->>FS: remove existing container/ directory
    loop per environment
        create_pixi->>FS: write container/{env}/Dockerfile & pixi.toml
        create_pixi->>Pixi_CLI: pixi init (channels)
        create_pixi->>Pixi_CLI: pixi add (deps -> container/{env}/pixi.toml)
        create_pixi->>FS: remove container/.pixi cache
    end
    create_pixi-->>Harpy_CLI: done
Loading
sequenceDiagram
    participant GitHub_Actions
    participant Matrix_Job
    participant setup-pixi
    participant Buildx
    participant Registry

    GitHub_Actions->>Matrix_Job: trigger per-environment build (matrix)
    Matrix_Job->>setup-pixi: prefix-dev/setup-pixi@v0.9.2 (activate)
    setup-pixi-->>Matrix_Job: pixi available
    Matrix_Job->>Buildx: docker/build-push (context: container/${{ matrix.env }})
    Buildx->>Registry: push pdimens/harpy:$${{ matrix.env }}_{{ github.ref_name }}
    Registry-->>Matrix_Job: image pushed
    Matrix_Job-->>GitHub_Actions: job complete
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

enhancement

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The pull request title "use smaller pixi containers" directly and concisely summarizes the main architectural change in this changeset. The PR replaces a single large Snakemake-created container with multiple smaller, environment-specific containers built using pixi, as evidenced by the new create_pixi.py module, matrix-driven CI workflow changes, and per-rule container directives across numerous snakefiles. The title captures the essential shift in container strategy and clearly indicates the use of pixi as the new tooling. A teammate scanning git history would immediately understand that this PR introduces a new containerization approach with smaller, more specialized images.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pixi_container

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a651dbe and 34afed8.

📒 Files selected for processing (1)
  • harpy/snakefiles/environments.smk (1 hunks)
🔇 Additional comments (2)
harpy/snakefiles/environments.smk (2)

2-2: Environment list consolidation looks good.

The consolidated environment definition with an explicit default list improves clarity and removes the dynamic "spades" logic, aligning with the PR's standardization goals.


8-11: Removal of global container directive is correct.

The removal of container: "docker://pdimens/harpy:latest" from this rule aligns with the PR objective of transitioning from a single global container to per-rule containerization. This environment management rule appropriately focuses on conda environment creation without container orchestration.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
harpy/snakefiles/metassembly.smk (2)

1-1: Remove the global containerized declaration.

This file still contains containerized: "docker://pdimens/harpy:latest" at the top level, which is inconsistent with the PR's objective to move to per-rule container specifications. Other workflow files in this PR have removed their global container declarations.

Apply this diff to remove the global declaration:

-containerized: "docker://pdimens/harpy:latest"
-

108-112: Inconsistent container strategy in metassembly workflow.

The spades_assembly rule still has container: None while other assembly-related rules in this PR are being containerized. This file appears to be incompletely migrated to the per-rule container strategy. Consider either:

  1. Adding an appropriate container specification to this rule, or
  2. Documenting why this rule should remain uncontainerized
🧹 Nitpick comments (4)
harpy/common/create_pixi.py (3)

101-110: Consider adding error handling for file operations.

The reset_pixi_global() function assumes the ~/.pixi directory structure exists and uses glob.glob() which returns an empty list if no matches are found. While this is safe, consider adding explicit directory existence checks for better error messages if the expected structure is missing.

 def reset_pixi_global():
     # remove any existing global packages
     pixidir = os.environ['HOME'] + "/.pixi"
+    if not os.path.exists(pixidir):
+        return  # Nothing to clean up
     for f in glob.glob(pixidir + "/bin/*"):

140-142: Remove debug print statement.

Line 142 contains a print(global_manifest) statement that appears to be debug code. Consider removing it or converting it to proper logging.

     _pix = subprocess.run("pixi global list".split(), capture_output = True, text = True)
     global_manifest = _pix.stdout.splitlines()[0].split()[-1].strip("\'")
-    print(global_manifest)

155-167: Capture stderr on subprocess failure for better debugging.

When subprocess.run() fails, the code prints _pix.stderr but the subprocess was not configured to capture stderr. Add capture_output=True or stderr=subprocess.PIPE to the subprocess calls to actually capture error messages.

         if env == "deconvolution":
             _pix = subprocess.run(
-                ["pixi", "global", "install", *_chan, "--environment", env, "--expose", "QuickDeconvolution", *deps]
+                ["pixi", "global", "install", *_chan, "--environment", env, "--expose", "QuickDeconvolution", *deps],
+                capture_output=True, text=True
             )
             if _pix.returncode > 0:
                 print(_pix.stderr)
                 sys.exit(1)
         else:
             _pix = subprocess.run(
-                ["pixi", "global", "install", *_chan, "--environment", env, *deps]
+                ["pixi", "global", "install", *_chan, "--environment", env, *deps],
+                capture_output=True, text=True
             )
harpy/commands/environments.py (1)

18-19: Consider removing or documenting the commented-out line.

The commented-out call to create_pixi_toml() should either be removed if it's no longer needed, or documented with a TODO/FIXME comment explaining why it's preserved.

Apply this diff to remove the dead code:

     create_pixi_dockerfiles()
-    #create_pixi_toml()

Or if it's needed for future work, document it:

     create_pixi_dockerfiles()
-    #create_pixi_toml()
+    # TODO: Re-enable create_pixi_toml() once [reason]
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 189b684 and 957f3b7.

📒 Files selected for processing (29)
  • .github/workflows/createrelease.yml (3 hunks)
  • .github/workflows/tests.yml (1 hunks)
  • harpy/commands/environments.py (1 hunks)
  • harpy/common/conda.py (1 hunks)
  • harpy/common/create_pixi.py (1 hunks)
  • harpy/common/workflow.py (1 hunks)
  • harpy/snakefiles/align_bwa.smk (5 hunks)
  • harpy/snakefiles/align_strobe.smk (4 hunks)
  • harpy/snakefiles/assembly.smk (5 hunks)
  • harpy/snakefiles/deconvolve.smk (1 hunks)
  • harpy/snakefiles/demultiplex_meier2021.smk (3 hunks)
  • harpy/snakefiles/environments.smk (1 hunks)
  • harpy/snakefiles/impute.smk (0 hunks)
  • harpy/snakefiles/metassembly.smk (1 hunks)
  • harpy/snakefiles/phase.smk (4 hunks)
  • harpy/snakefiles/qc.smk (3 hunks)
  • harpy/snakefiles/simulate_snpindel.smk (2 hunks)
  • harpy/snakefiles/simulate_variants.smk (2 hunks)
  • harpy/snakefiles/snp_freebayes.smk (2 hunks)
  • harpy/snakefiles/snp_mpileup.smk (1 hunks)
  • harpy/snakefiles/sv_leviathan.smk (4 hunks)
  • harpy/snakefiles/sv_leviathan_pop.smk (5 hunks)
  • harpy/snakefiles/sv_naibr.smk (2 hunks)
  • harpy/snakefiles/sv_naibr_phase.smk (3 hunks)
  • harpy/snakefiles/sv_naibr_pop.smk (3 hunks)
  • harpy/snakefiles/sv_naibr_pop_phase.smk (5 hunks)
  • harpy/snakefiles/validate_bam.smk (1 hunks)
  • harpy/snakefiles/validate_fastq.smk (1 hunks)
  • resources/Dockerfile (1 hunks)
💤 Files with no reviewable changes (1)
  • harpy/snakefiles/impute.smk
🧰 Additional context used
🧬 Code graph analysis (13)
harpy/snakefiles/simulate_variants.smk (1)
harpy/common/summaries.py (2)
  • simulate_snpindel (331-396)
  • simulate_variants (407-448)
harpy/common/workflow.py (1)
harpy/commands/environments.py (1)
  • container (68-79)
harpy/common/conda.py (1)
harpy/common/summaries.py (2)
  • simulate_variants (407-448)
  • simulate_snpindel (331-396)
harpy/snakefiles/snp_mpileup.smk (1)
harpy/common/summaries.py (1)
  • snp_mpileup (479-507)
harpy/snakefiles/demultiplex_meier2021.smk (1)
harpy/common/summaries.py (1)
  • demultiplex_meier2021 (148-181)
harpy/common/create_pixi.py (1)
harpy/commands/environments.py (1)
  • deps (22-29)
harpy/snakefiles/metassembly.smk (1)
harpy/common/summaries.py (2)
  • metassembly (231-267)
  • assembly (78-124)
harpy/snakefiles/assembly.smk (1)
harpy/common/summaries.py (2)
  • metassembly (231-267)
  • assembly (78-124)
harpy/snakefiles/sv_leviathan_pop.smk (1)
harpy/common/summaries.py (1)
  • leviathan (539-571)
harpy/snakefiles/sv_naibr_pop.smk (1)
harpy/common/summaries.py (1)
  • naibr_pop (656-691)
harpy/snakefiles/simulate_snpindel.smk (1)
harpy/commands/simulate_variants.py (2)
  • snpindel (44-121)
  • cnv (232-296)
harpy/commands/environments.py (1)
harpy/common/create_pixi.py (2)
  • create_pixi_dockerfiles (77-98)
  • create_pixi_toml (133-172)
harpy/snakefiles/phase.smk (1)
harpy/common/summaries.py (1)
  • phase (269-309)
🪛 actionlint (1.7.8)
.github/workflows/createrelease.yml

91-91: property "version" is not defined in object type {env: string}

(expression)

🪛 Ruff (0.14.1)
harpy/common/create_pixi.py

140-140: subprocess call: check for execution of untrusted input

(S603)


155-155: subprocess call: check for execution of untrusted input

(S603)


156-156: Starting a process with a partial executable path

(S607)


162-162: subprocess call: check for execution of untrusted input

(S603)


163-163: Starting a process with a partial executable path

(S607)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: assembly
  • GitHub Check: deconvolve
  • GitHub Check: sv naibr
  • GitHub Check: mpileup
  • GitHub Check: align BWA
  • GitHub Check: freebayes
  • GitHub Check: validate
  • GitHub Check: phase
  • GitHub Check: sv leviathan
  • GitHub Check: align strobe
  • GitHub Check: impute
  • GitHub Check: qc
  • GitHub Check: simulate variants
  • GitHub Check: demux meier2021
🔇 Additional comments (34)
harpy/snakefiles/validate_bam.smk (1)

73-74: LGTM! Container specification added for report generation.

The addition of the report_latest container for the create_report rule aligns with the PR's containerization strategy. The rule retains the conda environment specification as a fallback.

harpy/snakefiles/simulate_snpindel.smk (2)

97-98: LGTM! Containerization for simulation rules.

The addition of simulations_latest container specifications to both simulate_haploid and simulate_diploid rules is consistent with the PR's per-rule containerization approach.


169-170: LGTM! Consistent container specification.

The simulate_diploid rule correctly uses the same simulations_latest container as simulate_haploid, ensuring consistent execution environments.

.github/workflows/tests.yml (1)

614-615: LGTM! Disk space optimization for assembly tests.

Adding the disk cleanup step before assembly tests is a sensible optimization, as assembly operations can be disk-intensive. The jlumbroso/free-disk-space action is a well-maintained solution for this purpose.

harpy/snakefiles/validate_fastq.smk (1)

88-89: LGTM! Consistent containerization for report generation.

The container specification matches the identical change in validate_bam.smk, ensuring consistent report generation across validation workflows.

harpy/snakefiles/sv_naibr_pop_phase.smk (3)

140-141: LGTM! Appropriate containerization for phasing step.

The phase_alignments rule correctly uses the phase_latest container, aligning with its phasing functionality.


233-234: LGTM! Appropriate containerization for variant calling.

The call_variants rule correctly uses the variants_latest container, matching its variant calling functionality.


317-318: LGTM! Consistent containerization for reporting rules.

Both sample_reports and aggregate_report rules correctly use the report_latest container, ensuring consistent report generation across the workflow.

Also applies to: 346-347

harpy/common/workflow.py (1)

84-84: I'll perform a broader search of the codebase and check Snakemake documentation to verify this concern.<!-- [scratchpad_update]
[observations]
Web search results show:

  • Snakemake documentation references --software-deployment-method as a command-line option
  • Results show examples like "--software-deployment-method conda" and "--software-deployment-method conda apptainer" (list of two values)
  • Result 5 mentions: "deprecates --use-conda and --use-singularity in favor of --software-deployment-method conda or --software-deployment-method apptainer and --software-deployment-method conda apptainer"
  • Result 9 example: "snakemake --software-deployment-method conda apptainer" (space-separated values)
  • Result 2 shows "--software-deployment-method conda" (single value)
  • No examples show YAML profile format with this parameter yet

[analysis]

  • Snakemake documentation shows both single values (e.g., "conda") and multiple values (e.g., "conda apptainer") are accepted
  • Need to verify YAML profile format specifically and whether string vs list format is valid in YAML profiles
  • Need to look at the actual code in harpy/common/workflow.py to see context

[pending]

  • Check harpy/common/workflow.py context at line 84
  • Search for other software-deployment-method usage in this codebase
  • Verify YAML profile format handling
    [scratchpad_end] -->

Now let me search the codebase to understand the context and find other usages of this parameter:<!-- [scratchpad_update]
[observations]
Still no direct YAML profile example showing software-deployment-method format

  • Web results show profile YAML format: "key: value" pairs
  • Example: "cluster: qsub", "use-conda: True", etc.
  • Command-line examples show both single and multiple values: "--software-deployment-method conda" and "--software-deployment-method conda apptainer"
  • Documentation shows parameter can accept both formats on command line, but profile YAML format not explicitly shown

[pending]

  • Need to check harpy/common/workflow.py file to see full context
  • Need to search codebase for any existing software-deployment-method usage
  • Verify if string or list format is expected in YAML profiles by checking examples

[actions]

  1. Search harpy/common/workflow.py for context
  2. Search entire codebase for software-deployment-method
  3. Look for profile YAML files or examples
    [scratchpad_end] -->

Now let me search the repository for the actual file and context:<!-- [scratchpad_update]
[observations]
Key finding in result index 22-11:
"--software-deployment-method {apptainer,conda,env-modules} [{apptainer,conda,env-modules} ...]"
This explicitly shows the CLI accepts: multiple values OR single values (note the optional part in square brackets)

Results also show:

  • MegaLinter config shows the syntax includes optional multiple values
  • Snakemake accepts "conda apptainer" (two values space-separated on command line)
  • Snakemake accepts "conda" (single value)
  • Snakemake accepts "apptainer" (single value)

Still need to see how this translates to YAML profile format.
From result 14-4, 14-15, 14-25: Profile YAML follows command-line format, but unclear if list notation needed

[analysis]
The command-line format shows space-separated values: --software-deployment-method conda apptainer
The question is: should the YAML profile use:

  • "software-deployment-method: apptainer" (string)
  • "software-deployment-method: [apptainer]" (list)
  • Something else?

The code review comment mentions changing from list ["conda", "apptainer"] to a string "apptainer". Need to verify what format Snakemake profiles actually expect.
[scratchpad_end] -->

Based on my web search findings, Snakemake's --software-deployment-method parameter accepts both single and multiple values like {apptainer,conda,env-modules} [{apptainer,conda,env-modules} ...]. The documentation examples show both formats work on the command line: use --software-deployment-method conda or --software-deployment-method apptainer or --software-deployment-method conda apptainer.

However, I couldn't find definitive documentation showing the exact YAML profile format for this parameter (string vs. list), and the codebase search was inconclusive. Let me provide the rewritten review comment based on what was verified:

Verify that Snakemake's profile configuration accepts string format for software-deployment-method.

The --software-deployment-method parameter accepts single values like "conda" or "apptainer" as well as multiple values like "conda apptainer". However, YAML profile syntax for this parameter is not explicitly documented. Verify that the string value "apptainer" (or "conda" if not container-only) is correctly accepted in the profile's YAML configuration, not as a list. If Snakemake requires a list format in YAML profiles, update the code accordingly.

harpy/common/conda.py (1)

65-67: I need to gather more context about the conda environment and verify the actual simulation workflows.Now let me examine the conda.py file and check for active simulation workflows:Based on my analysis of the shell script output and web search results, I now have enough information to provide a comprehensive response. Let me generate the final rewritten review comment.

No simulation workflows depend on mimick; the removed linkedreads command was deprecated with sys.exit(1).

The shell script found that mimick is only referenced in the deprecated linkedreads command in harpy/commands/simulate.py, which is marked deprecated=True and immediately exits with sys.exit(1). Mimick simulates linked-read FASTQ data from various chemistries, allowing simulation of haplotypes and barcodes, while simuG is a tool for simulating genomic variants (SNPs, INDELs, CNVs, inversions, translocations), making it a unique general-purpose genome simulator. These are distinct tools for different purposes. Since the linkedreads command is non-functional and the simulations environment (lines 65-67) contains only simug>1.0.0 (which serves genome variant simulation, not linked-read simulation), the removal of mimick from this environment will not break active simulation workflows.

harpy/snakefiles/deconvolve.smk (2)

56-57: LGTM: Per-rule container specification added.

The deconvolve rule now correctly specifies its own container, aligning with the shift to environment-specific containers.


61-69: ****

The review comment incorrectly assumes that extract_forward rule lacks access to seqtk.

seqtk is available as a global environment dependency (defined in resources/meta.yaml and resources/harpy.yaml). The extract_forward rule requires no explicit container or conda directive to access it—rules without such specifications use the global environment by default.

This pattern is consistent throughout the codebase: the interleave rule in the same file explicitly specifies container: None while using seqtk mergepe, and many other snakefiles use seqtk without explicit container/conda specifications. The absence of a container directive does not prevent access to globally-available tools.

Likely an incorrect or invalid review comment.

harpy/snakefiles/assembly.smk (3)

51-52: LGTM: Assembly rules correctly containerized.

The assembly-related rules (cloudspades, scaffolding, QUAST_assessment, BUSCO_analysis) now appropriately use the assembly_latest container, aligning with the per-environment containerization strategy.

Also applies to: 106-107, 132-133, 154-155


172-173: LGTM: Correct container for reporting.

The build_report rule appropriately uses qc_latest since it runs MultiQC for quality control reporting.


60-67: Verify seqtk availability in the execution environment.

The interleave_fastq rule no longer specifies a container or conda environment. Ensure that seqtk is available in the default execution context.

harpy/snakefiles/simulate_variants.smk (2)

62-63: LGTM: Simulation rules correctly containerized.

The simulate_haploid and simulate_diploid rules now appropriately use the simulations_latest container.

Also applies to: 119-120


39-45: Verify bcftools availability for convert_vcf rule.

The convert_vcf rule (lines 39-45) no longer specifies a container or conda environment. Ensure that bcftools is available in the default execution context.

harpy/snakefiles/phase.smk (2)

131-132: LGTM: Phase rules correctly containerized.

The HapCut2-related rules (extract_hairs, link_fragments, phase) now appropriately use the phase_latest container.

Also applies to: 152-153, 172-173


268-269: LGTM: Correct container for Quarto reporting.

The phase_report rule appropriately uses report_latest for Quarto-based report generation.

harpy/snakefiles/demultiplex_meier2021.smk (3)

71-72: LGTM: Demultiplexing rule correctly containerized.

The demultiplex rule appropriately uses the demultiplex_latest container for the dmox tool.


93-94: LGTM: QC rules correctly containerized.

The assess_quality and quality_report rules appropriately use the qc_latest container for falco and MultiQC tools.

Also applies to: 150-151


38-41: Verify haplotag_acbd availability in the host environment.

The barcode_segments rule explicitly sets container: None, meaning it will execute in the host environment. Ensure that the haplotag_acbd command is available in the execution context.

harpy/snakefiles/snp_mpileup.smk (1)

228-229: LGTM: Correct container for variant reporting.

The variant_report rule appropriately uses report_latest for Quarto-based report generation, consistent with other reporting rules in the codebase.

harpy/common/create_pixi.py (2)

77-98: LGTM: Dockerfile generation logic is sound.

The create_pixi_dockerfiles() function correctly generates per-environment Dockerfiles with appropriate channel configurations and special handling for the deconvolution environment.


140-167: Static analysis security warnings are false positives.

The static analysis tool flags subprocess usage as potentially unsafe (S603, S607). However, in this context:

  • The pixi command is a legitimate build tool
  • All command arguments come from the hardcoded environ dictionary (lines 9-75)
  • No user input is passed to subprocess

These warnings can be safely ignored for this use case.

harpy/snakefiles/metassembly.smk (1)

82-86: Verify that the assembly_latest container exposes the executable as metaspades not metaspades.py.

The codebase shows an inconsistency: the error_correction rule uses metaspades (line 86), while the spades_assembly rule uses metaspades.py (line 112). SPAdes documentation consistently references metaspades.py as the standard executable. Without access to the container image configuration, the executable name change cannot be verified through the repository alone. Confirm that the docker://pdimens/harpy:assembly_latest container either wraps metaspades.py as metaspades or installs SPAdes with that alternative naming.

resources/Dockerfile (1)

1-7: LGTM! Clean pixi-based Dockerfile.

The Dockerfile correctly uses the pixi base image, copies the required manifest, and cleans up the cache to reduce image size.

harpy/snakefiles/align_strobe.smk (1)

85-86: LGTM! Per-rule containers properly specified.

The container specifications follow a consistent pattern and align with the PR's goal of using smaller, environment-specific containers. The _latest tags will be replaced with version tags during the release process.

Also applies to: 224-225, 281-282, 301-302

.github/workflows/createrelease.yml (1)

70-78: Verify the shell directive placement with setup-pixi action.

The shell: micromamba-shell {0} directive appears to be set on a uses: action step (lines 71-76), but shell directives are only applicable to run: steps. The shell directive should likely be moved to line 78 where run: harpy containerize is executed, or the step structure needs adjustment.

Please verify the intended structure. If the shell should apply to the run command, consider restructuring:

      - name: Setup Pixi
        uses: prefix-dev/setup-pixi@v0.9.2
        with:
          pixi-version: v0.57.0
          cache: true
          auth-host: prefix.dev
          auth-token: ${{ secrets.PREFIX_DEV_TOKEN }}
      - name: Recreate container
        shell: micromamba-shell {0}
        run: harpy containerize
harpy/snakefiles/align_bwa.smk (1)

48-49: LGTM! Container specifications consistent with broader changes.

The per-rule container approach is correctly implemented and follows the same pattern as other workflow files in this PR.

Also applies to: 106-107, 245-246, 302-303, 322-323

harpy/snakefiles/qc.smk (1)

60-61: LGTM! QC workflow properly containerized.

The container specifications for QC-related rules follow the established pattern and align with the per-environment container strategy.

Also applies to: 102-103, 128-129

harpy/snakefiles/sv_leviathan.smk (1)

50-51: LGTM! Structural variant workflow properly containerized.

The container specifications correctly segregate alignment, variant calling, and reporting operations into their respective environment-specific containers.

Also applies to: 71-72, 106-107, 200-201

harpy/snakefiles/sv_naibr_phase.smk (1)

117-118: LGTM! Phased variant calling workflow properly containerized.

The container specifications correctly include the phasing environment and follow the established pattern for variant calling and reporting.

Also applies to: 180-181, 264-265

harpy/snakefiles/sv_leviathan_pop.smk (1)

119-120: Based on the script output, the review comment's assessment is accurate and well-justified:

Verified facts:

  • bwa index produces files with extensions amb, ann, bwt, pac and sa
  • sv_leviathan_pop.smk outputs exactly these file extensions (lines 112: .ann, .bwt, .pac, .sa, .amb), confirming it runs bwa index
  • sv_naibr_pop.smk and snp_freebayes.smk use only seqtk seq and samtools faidx with no bwa operations
  • Container difference is present: sv_leviathan_pop has align_latest, while the other three workflows have no container directive

Conclusion:
The container assignment in sv_leviathan_pop is intentional and justified. The align_latest container is correctly applied only to the workflow that requires bwa indexing. The other workflows legitimately require no such container since they only use seqtk and samtools, which don't demand the align environment. This represents correct, differentiated container usage across workflows with distinct tool requirements.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
.github/workflows/createrelease.yml (1)

91-91: Fix the template expression syntax error.

Line 91 still contains a double $$ which is incorrect GitHub Actions YAML syntax. Template expressions should use a single $. This matches the issue flagged in the previous review and needs to be corrected.

Apply this diff:

-          tags: pdimens/harpy:$${{ matrix.env }}_{{ github.ref_name }}
+          tags: pdimens/harpy:${{ matrix.env }}_{{ github.ref_name }}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 957f3b7 and 796eb53.

📒 Files selected for processing (1)
  • .github/workflows/createrelease.yml (3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: align strobe
  • GitHub Check: qc
  • GitHub Check: assembly
  • GitHub Check: simulate variants
  • GitHub Check: sv naibr
  • GitHub Check: impute
  • GitHub Check: align BWA
  • GitHub Check: mpileup
  • GitHub Check: freebayes
  • GitHub Check: sv leviathan
  • GitHub Check: phase
  • GitHub Check: validate
  • GitHub Check: demux meier2021

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (2)
.github/workflows/createrelease.yml (2)

70-78: Critical: Invalid workflow step structure mixing action and command.

Lines 77-78 incorrectly place shell and run keys at the same indentation level as the uses action. GitHub Actions does not allow mixing action execution (uses) with shell command execution (run) in the same step. This will cause a workflow syntax error.

Apply this diff to split into separate steps:

       - name: Recreate containers
         uses: prefix-dev/setup-pixi@v0.9.2
         with:
           pixi-version: v0.57.0
           cache: true
           auth-host: prefix.dev
           auth-token: ${{ secrets.PREFIX_DEV_TOKEN }}
-        shell: micromamba-shell {0}
-        run: harpy containerize
+      - name: Run harpy containerize
+        shell: micromamba-shell {0}
+        run: harpy containerize

86-91: Critical: Fix template syntax errors in image tag.

Line 91 has two syntax errors:

  1. Double $$ before the first template expression (should be single $)
  2. Missing $ before the second template expression {{ github.ref_name }}

These will cause incorrect tag formatting or workflow failure.

Apply this diff to fix the tag syntax:

       - name: Build and push
         uses: docker/build-push-action@v6
         with:
           context: .container/${{ matrix.env }}
           push: true
-          tags: pdimens/harpy:$${{ matrix.env }}_{{ github.ref_name }}
+          tags: pdimens/harpy:${{ matrix.env }}_${{ github.ref_name }}
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 796eb53 and 44c175f.

📒 Files selected for processing (3)
  • .github/workflows/createrelease.yml (3 hunks)
  • harpy/commands/environments.py (1 hunks)
  • harpy/common/create_pixi.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
harpy/commands/environments.py (1)
harpy/common/create_pixi.py (1)
  • create_pixi_dockerfiles (99-117)
🪛 Ruff (0.14.1)
harpy/common/create_pixi.py

110-110: subprocess call: check for execution of untrusted input

(S603)


112-112: subprocess call: check for execution of untrusted input

(S603)


114-114: subprocess call: check for execution of untrusted input

(S603)


115-115: Starting a process with a partial executable path

(S607)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: sv naibr
  • GitHub Check: freebayes
  • GitHub Check: assembly
  • GitHub Check: align strobe
  • GitHub Check: sv leviathan
  • GitHub Check: impute
  • GitHub Check: mpileup
  • GitHub Check: align BWA
  • GitHub Check: phase
  • GitHub Check: simulate variants
  • GitHub Check: qc
  • GitHub Check: demux meier2021
  • GitHub Check: validate
🔇 Additional comments (4)
harpy/commands/environments.py (1)

7-7: LGTM - Clean import for new architecture.

The import correctly references the new pixi-based container generation function.

harpy/common/create_pixi.py (1)

7-73: LGTM - Well-structured environment definitions.

The environment-to-dependency mappings are clear and align with the matrix strategy in the CI workflow.

.github/workflows/createrelease.yml (2)

9-38: LGTM - Clean tarball generation job.

The renamed job properly versions the source files and creates a release artifact.


40-69: LGTM - Matrix strategy aligns with environment definitions.

The matrix values match the environment keys defined in harpy/common/create_pixi.py, ensuring all 11 environments get their own containers.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
.github/workflows/createrelease.yml (1)

76-76: Fix the double dollar sign in the tag template.

The tag contains $${{ matrix.env }} with a double $$, which will produce a literal $ in the output instead of interpolating the variable. This issue was previously flagged but remains unfixed.

Apply this diff to fix the tag:

-          tags: pdimens/harpy:$${{ matrix.env }}_{{ github.ref_name }}
+          tags: pdimens/harpy:${{ matrix.env }}_${{ github.ref_name }}

Note: GitHub Actions syntax requires single $ for variable interpolation. Also fixed the second interpolation which was missing the opening $.

🧹 Nitpick comments (1)
.github/workflows/tests.yml (1)

68-72: Consider enabling Pixi cache to improve CI performance.

The Pixi setup step has cache: false in all jobs. Unless intentionally disabled for testing, enabling the cache could significantly speed up workflow runs.

Apply this diff if caching is desired:

       - name: Install Harpy
         uses: prefix-dev/setup-pixi@v0.9.2
         with:
           pixi-version: v0.56.0
-          cache: false
+          cache: true
           activate-environment: true

Apply the same change to all other jobs (lines 89-93, 115-119, 136-140, 157-161, 178-182, 199-203, 224-228, 249-253, 280-284, 312-316, 338-342, 374-378, 418-422, 449-453).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 44c175f and 1223f4b.

📒 Files selected for processing (5)
  • .github/workflows/createrelease.yml (2 hunks)
  • .github/workflows/tests.yml (17 hunks)
  • .gitignore (1 hunks)
  • harpy/snakefiles/metassembly.smk (2 hunks)
  • pixi.toml (1 hunks)
🔇 Additional comments (3)
.gitignore (1)

38-38: LGTM! Appropriate specificity for Pixi files.

The change from the broad pixi* pattern to the specific pixi.lock correctly allows tracking pixi.toml while ignoring generated lock files.

.github/workflows/tests.yml (1)

423-424: Verify the disk space action is necessary.

The "Clear Space" step was added specifically to the assembly job. Confirm that disk space was actually a constraint in this job.

harpy/snakefiles/metassembly.smk (1)

1-1: Clarify mixed container execution strategy.

The file has inconsistent container declarations:

  • Line 1: Global containerized: "docker://pdimens/harpy:latest"
  • Line 84: error_correction explicitly uses docker://pdimens/harpy:assembly_latest
  • Line 110: spades_assembly explicitly sets container: None

This creates confusion about which container each rule uses. The spades_assembly rule appears to intentionally opt out of containerization while error_correction uses a specific container.

Verify the intended container usage:

  1. Should spades_assembly run without a container, or is this an oversight?
  2. Why does error_correction use assembly_latest instead of the global latest?
  3. Should there be consistency between the two rules that use the same underlying tool (metaspades)?

Also applies to: 82-86, 108-112

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
harpy/snakefiles/metassembly.smk (1)

105-110: Fix missing conda environment configuration: spades_assembly rule references non-existent envs/assembly.yaml with container: None.

The rule specifies container: None, which requires a valid local conda environment. However, the referenced envs/assembly.yaml file does not exist in the repository. This will cause the rule to fail at runtime.

Compare this to the spades_assembly rule in assembly.smk (line 49-50), which uses the same conda reference but has a valid Docker container specified, avoiding the local conda requirement.

Options to fix:

  • Add a valid Docker container image (e.g., "docker://pdimens/harpy:assembly_latest") to match the pattern used in assembly.smk, OR
  • Create the missing envs/assembly.yaml conda environment file
harpy/commands/demultiplex.py (2)

37-40: Click argument name mismatch will break invocation.

@click.argument uses names 'R12_FQ'/'I12_FQ' but the function expects r12_fq/i12_fq. Click matches by parameter name; this will error at runtime.

Apply either approach; preferred is to make argument names lowercase to match the function:

-@click.argument('R12_FQ', required=True, type=FASTQfile(dir_ok= False), nargs=2)
-@click.argument('I12_FQ', required=True, type=FASTQfile(dir_ok= False), nargs=2)
+@click.argument('r12_fq', required=True, type=FASTQfile(dir_ok= False), nargs=2)
+@click.argument('i12_fq', required=True, type=FASTQfile(dir_ok= False), nargs=2)

44-49: Doc typo: index reads.

Text says “R1, R2, I2, and I2”; should be “R1, R2, I1, and I2”.

-Use the R1, R2, I2, and I2 FASTQ files provided by the sequencing facility as inputs after the options and schema (4 files, in that exact order). 
+Use the R1, R2, I1, and I2 FASTQ files provided by the sequencing facility as inputs after the options and schema (4 files, in that exact order).
harpy/commands/snp.py (2)

142-149: Same host-only make_windows issue in mpileup path.

Please apply the same refactor here: run via a Snakemake rule in the container image or use a Python fallback.


69-75: Refactor to eliminate host-dependency on make_windows CLI invocation—integrate into Snakemake pipeline or call Python function directly.

The concern is confirmed. The code invokes make_windows via os.system() at lines 71 and 144 in harpy/commands/snp.py, requiring the binary in PATH on the host and breaking containerization. Since a Python implementation of make_windows already exists at harpy/scripts/make_windows.py, the best approach is to move region generation into the Snakemake pipeline (executed within the container image) or refactor the Python function to accept parameters directly rather than parsing sys.argv, allowing it to be called from Python code without shell invocation.

Two locations require updates:

  • Line 71 (freebayes function)
  • Line 144 (mpileup function)
♻️ Duplicate comments (5)
.github/workflows/createrelease.yml (3)

47-47: Fix double dollar sign in tag reference.

The tag expression uses $${{ matrix.env }} but should use a single $: ${{ matrix.env }}. Double dollar signs will cause the Docker tag to be malformed.

Apply this diff:

-          tags: pdimens/harpy:$${{ matrix.env }}_{{ github.ref_name }}
+          tags: pdimens/harpy:${{ matrix.env }}_{{ github.ref_name }}

17-17: Remove invalid matrix entries: stitch is a conda environment and metassembly does not have a dedicated container.

The matrix references 11 environments, but only 9 correspond to actual containers built by the harpy containerize flow:

  • stitch is a conda environment (envs/stitch.yaml), not a container—remove it from the matrix
  • metassembly does not use a metassembly_latest container; metassembly.smk references assembly_latest instead—remove or verify if it should build a separate metassembly container

Update the matrix to: [align, assembly, deconvolution, demultiplex, phase, qc, report, simulations, variants]

Apply this diff:

-        env: [align, assembly, deconvolution, demultiplex, metassembly, phase, qc, report, simulations, stitch, variants]
+        env: [align, assembly, deconvolution, demultiplex, phase, qc, report, simulations, variants]

62-65: Update sed pattern to handle both _latest suffix and standalone latest tags.

The current sed command only replaces _latest (e.g., report_latest), but some snakefile container references use a standalone latest tag (e.g., metassembly.smk:1). Both patterns must be replaced to ensure all container references are versioned consistently.

Apply this diff to add a second substitution:

          for i in harpy/snakefiles/*.smk; do
-           sed -i "s/_latest/_${{ github.ref_name }}/g" $i
+           sed -i "s/_latest/_${{ github.ref_name }}/g; s/\blatest\b/${{ github.ref_name }}/g" $i
          done
harpy/common/create_pixi.py (2)

75-97: Update pixi version to match CI workflow.

The Dockerfile uses pixi:0.56.0 while the CI workflow expects v0.57.0. This version inconsistency can cause behavioral differences between local and CI environments.

Apply this diff to align versions:

-FROM ghcr.io/prefix-dev/pixi:0.56.0 AS build
+FROM ghcr.io/prefix-dev/pixi:0.57.0 AS build

104-124: Critical: Path mismatch with CI workflow will cause build failures.

The function creates directories under container/ but the CI workflow (createrelease.yml) expects .container/${{ matrix.env }}. This mismatch will cause Docker builds to fail with "context not found" errors.

Apply this diff to fix the path:

-    shutil.rmtree("container", ignore_errors=True)
+    shutil.rmtree(".container", ignore_errors=True)
     for env,deps in environ.items():
-        os.makedirs(f"container/{env}", exist_ok=True)
+        os.makedirs(f".container/{env}", exist_ok=True)
-        with open(f"container/{env}/Dockerfile", "w") as dockerfile:
+        with open(f".container/{env}/Dockerfile", "w") as dockerfile:
             dockerfile.write(dockerfile_text)
         if env == "report":
             subprocess.run(
-                f"pixi init container/{env} -c conda-forge -c r".split(),
+                f"pixi init .container/{env} -c conda-forge -c r".split(),
                 check = True
             )
         else:
             subprocess.run(
-                f"pixi init container/{env} -c conda-forge -c bioconda".split(),
+                f"pixi init .container/{env} -c conda-forge -c bioconda".split(),
                 check = True    
             )

         subprocess.run(
-            ["pixi", "add", "--no-progress", "--manifest-path", f"container/{env}/pixi.toml", *deps],
+            ["pixi", "add", "--no-progress", "--manifest-path", f".container/{env}/pixi.toml", *deps],
             check = True
         )
-        shutil.rmtree("container/.pixi", ignore_errors=True)
+        shutil.rmtree(".container/.pixi", ignore_errors=True)
🧹 Nitpick comments (1)
harpy/commands/phase.py (1)

29-29: Type mismatch: quiet is an int in CLI.

click defines quiet as IntRange(0,2), but Workflow.quiet is annotated bool. Recommend updating the annotation to int to reflect levels. As per the shared snippet in harpy/common/workflow.py.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d90efca and 3f77b83.

📒 Files selected for processing (21)
  • .github/workflows/createrelease.yml (3 hunks)
  • harpy/commands/align.py (2 hunks)
  • harpy/commands/assembly.py (1 hunks)
  • harpy/commands/deconvolve.py (1 hunks)
  • harpy/commands/demultiplex.py (1 hunks)
  • harpy/commands/impute.py (1 hunks)
  • harpy/commands/metassembly.py (1 hunks)
  • harpy/commands/phase.py (1 hunks)
  • harpy/commands/qc.py (1 hunks)
  • harpy/commands/resume.py (3 hunks)
  • harpy/commands/snp.py (2 hunks)
  • harpy/commands/sv.py (2 hunks)
  • harpy/commands/validate.py (2 hunks)
  • harpy/common/create_pixi.py (1 hunks)
  • harpy/common/file_ops.py (0 hunks)
  • harpy/common/workflow.py (9 hunks)
  • harpy/scripts/separate_singletons.py (0 hunks)
  • harpy/snakefiles/metassembly.smk (3 hunks)
  • pixi.toml (1 hunks)
  • resources/harpy.yaml (1 hunks)
  • resources/meta.yaml (1 hunks)
💤 Files with no reviewable changes (2)
  • harpy/common/file_ops.py
  • harpy/scripts/separate_singletons.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • pixi.toml
🧰 Additional context used
🧬 Code graph analysis (16)
harpy/commands/phase.py (1)
harpy/common/workflow.py (2)
  • Workflow (21-294)
  • setup_snakemake (61-108)
harpy/commands/qc.py (2)
harpy/common/workflow.py (2)
  • Workflow (21-294)
  • setup_snakemake (61-108)
harpy/commands/environments.py (2)
  • container (67-78)
  • container (100-111)
harpy/commands/sv.py (1)
harpy/common/workflow.py (2)
  • Workflow (21-294)
  • setup_snakemake (61-108)
harpy/commands/deconvolve.py (1)
harpy/common/workflow.py (2)
  • Workflow (21-294)
  • setup_snakemake (61-108)
harpy/commands/snp.py (2)
harpy/common/workflow.py (2)
  • Workflow (21-294)
  • setup_snakemake (61-108)
harpy/commands/environments.py (2)
  • container (67-78)
  • container (100-111)
harpy/commands/metassembly.py (2)
harpy/common/workflow.py (2)
  • Workflow (21-294)
  • setup_snakemake (61-108)
harpy/commands/environments.py (3)
  • container (67-78)
  • conda (32-64)
  • container (100-111)
harpy/snakefiles/metassembly.smk (1)
harpy/common/summaries.py (2)
  • metassembly (231-267)
  • assembly (78-124)
harpy/commands/align.py (1)
harpy/common/workflow.py (2)
  • Workflow (21-294)
  • setup_snakemake (61-108)
harpy/common/create_pixi.py (1)
harpy/commands/environments.py (2)
  • containerize (11-51)
  • container (100-111)
harpy/commands/resume.py (2)
harpy/common/conda.py (1)
  • check_environments (108-130)
harpy/common/workflow.py (1)
  • Workflow (21-294)
harpy/commands/validate.py (1)
harpy/common/workflow.py (2)
  • Workflow (21-294)
  • setup_snakemake (61-108)
harpy/commands/assembly.py (1)
harpy/common/workflow.py (2)
  • Workflow (21-294)
  • setup_snakemake (61-108)
harpy/commands/demultiplex.py (1)
harpy/common/workflow.py (2)
  • Workflow (21-294)
  • setup_snakemake (61-108)
.github/workflows/createrelease.yml (1)
harpy/commands/environments.py (1)
  • container (100-111)
harpy/commands/impute.py (1)
harpy/common/workflow.py (2)
  • Workflow (21-294)
  • setup_snakemake (61-108)
harpy/common/workflow.py (4)
harpy/common/file_ops.py (3)
  • filepath (9-11)
  • gzip_file (18-23)
  • purge_empty_logs (25-32)
harpy/commands/environments.py (4)
  • container (67-78)
  • conda (32-64)
  • containerize (11-51)
  • container (100-111)
harpy/common/printing.py (1)
  • print_error (13-53)
harpy/common/conda.py (1)
  • create_conda_recipes (10-106)
🪛 Ruff (0.14.1)
harpy/common/create_pixi.py

110-110: subprocess call: check for execution of untrusted input

(S603)


115-115: subprocess call: check for execution of untrusted input

(S603)


120-120: subprocess call: check for execution of untrusted input

(S603)


121-121: Starting a process with a partial executable path

(S607)

🔇 Additional comments (27)
resources/harpy.yaml (1)

12-12: Clarify the rationale for the rich-click version constraint change.

The change from pinned rich-click =1.9 to >=1.9.3 introduces flexibility in dependency resolution but appears orthogonal to the PR's core objective of replacing Snakemake containers with smaller pixi-driven ones. Allowing open upper bounds on dependencies can introduce unexpected breaking changes in future updates.

Consider:

  • Is this change intentional or incidental?
  • Does rich-click >=1.9.3 introduce any known compatibility issues?
  • Should you pin a reasonable upper bound (e.g., rich-click >=1.9.3,<2.0) to prevent major version breaks?
.github/workflows/createrelease.yml (2)

23-28: Verify pixi setup correctly installs harpy.

The Install Harpy step uses prefix-dev/setup-pixi@v0.9.2 with activate-environment: true, but there is no explicit call to install harpy after pixi is set up. Ensure that harpy is installed in the pixi environment (either via pixi.toml or explicit pixi install call) so that the subsequent harpy containerize command succeeds.


26-26: I need to clarify which pixi product we're dealing with. Based on the workflow file context (containerization, pixi-dependent tasks), this appears to be the pixi CLI tool from prefix.dev, not PixiJS. Let me search for information specific to the pixi CLI package manager:

Based on the search results, v0.57.0 is the latest stable version of pixi (released on 2025-10-20), making the workflow's use of v0.56.0 a downgrade to an older patch version.

Let me search for specific details about what changed in v0.57.0 to understand why a downgrade might be necessary:

Use the latest pixi version v0.57.0 instead of downgrading to v0.56.0.

v0.57.0 is the latest stable release of pixi, and the workflow is currently downgrading to v0.56.0. No significant breaking changes between these two patch versions were documented. Unless there's a specific compatibility issue with the harpy containerize workflow that requires v0.56.0, upgrading to v0.57.0 would be beneficial for stability and access to recent improvements.

resources/meta.yaml (1)

52-52: LGTM: Dependency version updated.

The rich-click version bump to >=1.9.3 is appropriate and aligns with keeping dependencies current.

harpy/common/create_pixi.py (1)

7-73: LGTM: Environment definitions are well-structured.

The environment-to-dependency mappings are clear and comprehensive, with appropriate version pins where needed.

harpy/snakefiles/metassembly.smk (2)

79-84: LGTM: Container and conda environment properly specified.

The rule correctly specifies both the conda environment and container image, with the proper metaspades.py command.


221-231: LGTM: Output organization improved.

The updated params and shell command properly move assembly outputs into the athena directory for better organization.

harpy/common/workflow.py (3)

25-43: LGTM: Constructor signature updated for container-first design.

The addition of the container parameter to __init__ is a cleaner design that makes container usage explicit at workflow creation time. The change is consistently applied across all command modules in this PR.


61-86: LGTM: Method signature simplified by using instance attribute.

The removal of the container parameter from setup_snakemake and use of self.container instead is consistent with the constructor changes and simplifies the method signature.


266-267: LGTM: Conditional conda recipe creation based on container flag.

The logic correctly skips conda recipe creation when using containers, which aligns with the PR's goal of container-based deployment.

harpy/commands/qc.py (1)

49-50: LGTM: Workflow initialization updated to new API.

The changes correctly adapt to the new Workflow API, passing container to the constructor and removing it from setup_snakemake.

harpy/commands/sv.py (2)

62-63: LGTM: Leviathan workflow initialization updated.

The changes correctly implement the new Workflow API for the leviathan subcommand.


145-146: LGTM: NAIBR workflow initialization updated.

The changes correctly implement the new Workflow API for the naibr subcommand.

harpy/commands/metassembly.py (1)

42-44: LGTM: Metassembly workflow updated to new API and consolidated environments.

The changes correctly implement the new Workflow API and remove the redundant "spades" environment in favor of the consolidated "assembly" environment.

harpy/commands/assembly.py (1)

49-50: LGTM: Assembly workflow initialization updated to new API.

The changes correctly adapt to the new Workflow API, maintaining consistency with other command modules.

harpy/commands/demultiplex.py (1)

50-51: API alignment looks good.

Constructor/setup update matches Workflow changes.

harpy/commands/deconvolve.py (1)

36-37: API alignment looks good.

Constructor/setup update matches Workflow changes.

harpy/commands/snp.py (2)

59-60: API alignment looks good (freebayes).

Matches new Workflow constructor and setup signatures.


132-133: API alignment looks good (mpileup).

Matches new Workflow constructor and setup signatures.

harpy/commands/validate.py (2)

44-46: API alignment looks good (validate_bam).

Constructor/setup update matches Workflow changes.


89-90: API alignment looks good (validate_fastq).

Constructor/setup update matches Workflow changes.

harpy/commands/align.py (2)

54-55: LGTM! Clean API refactor.

The container parameter is correctly moved from setup_snakemake() to the Workflow constructor, aligning with the new API design.


125-126: LGTM! Consistent refactor.

The strobe command follows the same pattern as bwa, correctly passing container to the constructor and removing it from setup_snakemake().

harpy/commands/impute.py (1)

45-46: LGTM! Consistent with the refactor pattern.

The impute workflow correctly adopts the new API, passing container to the constructor and removing it from setup_snakemake().

harpy/commands/resume.py (3)

8-8: LGTM! Simplified imports.

Removing create_conda_recipes from the import is correct since Workflow.initialize() now handles conda recipe creation internally.


17-29: LGTM! Cleaner API surface.

Removing the conda parameter and deriving it from the workflow configuration simplifies the interface and ensures consistency with the original workflow setup.


65-66: Clean mutation with clear intent.

Using pop() to both assign and remove inputs from the config is appropriate given the comment explaining the intent.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3f77b83 and 0c7b33b.

📒 Files selected for processing (1)
  • harpy/commands/simulate_variants.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
harpy/commands/simulate_variants.py (2)
harpy/common/workflow.py (1)
  • setup_snakemake (61-108)
harpy/commands/sv.py (1)
  • naibr (128-190)

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
harpy/snakefiles/sv_naibr_pop_phase.smk (1)

83-97: Add container or conda directives to rules using external tools.

Several rules use external tools but lack container or conda directives. Since this PR removes the top-level container in favor of per-rule containers, these rules may fail if the required tools aren't available in the execution environment:

  • preprocess_reference (lines 83-97): requires seqtk and samtools
  • index_snps (lines 99-105): requires bcftools
  • index_snps_gz (lines 107-113): requires tabix
  • index_alignments (lines 115-121): requires samtools
  • concat_groups (lines 179-198): requires concatenate_bam and samtools
  • infer_variants (lines 238-254): requires infer_sv

Consider adding container directives similar to the other rules. For example:

 rule preprocess_reference:
     input:
         genomefile
     output: 
         geno = workflow_geno,
         fai = f"{workflow_geno}.fai"
     log:
         f"{workflow_geno}.preprocess.log"
+    container:
+        "docker://pdimens/harpy:align_latest"
     shell: 
         """
         {{
             seqtk seq {input} > {output.geno}
             samtools faidx --fai-idx {output.fai} {output.geno}
         }} 2> {log}
         """

Apply similar patterns to the other rules, choosing the appropriate container based on the tools used (likely align_latest or variants_latest).

Also applies to: 99-113, 179-198, 238-254

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3381f7a and 8373ebb.

📒 Files selected for processing (2)
  • harpy/snakefiles/sv_naibr_phase.smk (5 hunks)
  • harpy/snakefiles/sv_naibr_pop_phase.smk (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • harpy/snakefiles/sv_naibr_phase.smk
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: qc
  • GitHub Check: freebayes
  • GitHub Check: align strobe
  • GitHub Check: simulate variants
  • GitHub Check: align BWA
  • GitHub Check: phase
  • GitHub Check: mpileup
  • GitHub Check: assembly
  • GitHub Check: sv naibr
  • GitHub Check: sv leviathan
  • GitHub Check: impute
  • GitHub Check: validate
🔇 Additional comments (1)
harpy/snakefiles/sv_naibr_pop_phase.smk (1)

115-121: Verify the intended behavior for index file location.

The issue identified in the review comment is real: the index_alignments rule declares output as "{sample}.bai" (working directory), but samtools index {input} creates the index file next to the input BAM file. Since get_alignments returns a full path from bamlist (line 73), there's a mismatch. The get_align_index function (line 81) correctly returns aln[0] + ".bai" (full path), expecting the index next to the BAM file. This mismatch breaks Snakemake's dependency tracking, and phase_alignments won't correctly depend on index_alignments.

However, the proposed fix requires clarification:

  • If indexes should be next to BAM files (recommended for portability): Fix the output declaration to return a path matching where samtools creates it, rather than changing the command.
  • If indexes should be in the working directory: The command should be samtools index {input} {output}, AND get_align_index must also be updated to return f"{wildcards.sample}.bai" instead of the full path.

Confirm which approach aligns with the intended workflow architecture before applying the fix.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8373ebb and 008fce5.

📒 Files selected for processing (13)
  • harpy/reports/align_bxstats.qmd (1 hunks)
  • harpy/reports/align_stats.qmd (1 hunks)
  • harpy/reports/bcftools_stats.qmd (1 hunks)
  • harpy/reports/hapcut.qmd (1 hunks)
  • harpy/reports/impute.qmd (1 hunks)
  • harpy/reports/leviathan.qmd (1 hunks)
  • harpy/reports/leviathan_pop.qmd (1 hunks)
  • harpy/reports/naibr.qmd (1 hunks)
  • harpy/reports/naibr_pop.qmd (1 hunks)
  • harpy/reports/qc_bx_stats.qmd (1 hunks)
  • harpy/reports/stitch_collate.qmd (1 hunks)
  • harpy/reports/validate_bam.qmd (1 hunks)
  • harpy/reports/validate_fastq.qmd (1 hunks)
🔇 Additional comments (5)
harpy/reports/stitch_collate.qmd (1)

17-22: LGTM! Consistent with containerization approach.

Same pattern as other reports in this PR—replacing dynamic installation with explicit imports. The change is appropriate and maintains consistency across the codebase.

harpy/reports/align_bxstats.qmd (1)

8-13: LGTM! Explicit imports align with containerization goals.

The refactoring is consistent with the broader PR strategy. The explicit library() calls make dependencies clear and ensure failures occur early if packages are missing from the container.

harpy/reports/leviathan.qmd (1)

19-24: LGTM! Consistent pattern across reports.

The change maintains consistency with the containerization approach used throughout this PR. Explicit imports improve clarity and enable early failure detection.

harpy/reports/validate_bam.qmd (1)

9-14: LGTM! Completes consistent refactoring across all reports.

This final report file follows the same pattern established across all reviewed files. The uniform approach to replacing dynamic installation with explicit imports is clean and well-executed.

harpy/reports/naibr.qmd (1)

42-47: LGTM! Explicit package imports are properly supported by environment configuration.

The shift from dynamic package installation to explicit library() calls is well-founded. All required packages (r-dt, r-dplyr, r-tidyr, r-biocircos) are defined in the environment configuration, so the fail-fast behavior will correctly surface any issues during environment setup rather than at runtime.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (2)
harpy/snakefiles/sv_naibr_pop_phase.smk (1)

137-138: Container image availability concern (previously flagged).

Per-rule container directives have been added across this workflow, referencing docker://pdimens/harpy:phase_latest, docker://pdimens/harpy:variants_latest, and docker://pdimens/harpy:report_latest. This aligns with the PR's objective of using smaller, specialized containers.

However, a previous review noted that these images may not exist on Docker Hub. Ensure these container images are built and pushed before merging, or the workflow will fail at runtime when Snakemake attempts to pull them.

Run the following script to verify the container images are accessible:

#!/bin/bash
# Description: Verify that the specified container images exist and are pullable

echo "Checking container images..."
for tag in phase_latest variants_latest report_latest; do
    echo "Checking pdimens/harpy:${tag}..."
    docker pull "pdimens/harpy:${tag}" --quiet && echo "✓ Found" || echo "✗ Not found"
done

Also applies to: 231-232, 315-316, 344-345

harpy/snakefiles/sv_naibr_phase.smk (1)

114-115: Container image availability concern (previously flagged).

The per-rule container directives reference the same pdimens/harpy images as in sv_naibr_pop_phase.smk. Please ensure these images are built and accessible before merging to avoid runtime failures.

See verification script in the review comment for harpy/snakefiles/sv_naibr_pop_phase.smk at lines 137-138.

Also applies to: 177-178, 261-262

🧹 Nitpick comments (2)
harpy/common/printing.py (1)

119-134: Consider adding type hints for consistency.

The new print_shellcmd_simple function is well-implemented and integrates nicely with the existing printing utilities. However, it lacks type hints for the text parameter and return value, unlike other functions in this module.

Apply this diff to add type hints:

-def print_shellcmd_simple(text):
+def print_shellcmd_simple(text: str) -> None:
harpy/snakefiles/sv_naibr_phase.smk (1)

71-81: Consider reordering outputs for consistency.

The index_alignments rule lists the BAI file before the named bam parameter in the outputs (lines 75-76). While functionally correct, it's more conventional to list named outputs first or keep them in logical order (BAM before BAI).

Apply this diff for better readability:

     output:
-        temp("workflow/input/bam/{sample}.bam.bai"),
-        bam = temp("workflow/input/bam/{sample}.bam")
+        bam = temp("workflow/input/bam/{sample}.bam"),
+        bai = temp("workflow/input/bam/{sample}.bam.bai")

Then update the shell command to reference the named output:

     shell:
         """
         ln -sr {input} {output.bam}
-        samtools index {output.bam}
+        samtools index {output.bam} {output.bai}
         """
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 008fce5 and 6bb6335.

📒 Files selected for processing (6)
  • harpy/commands/diagnose.py (2 hunks)
  • harpy/common/launch.py (2 hunks)
  • harpy/common/printing.py (2 hunks)
  • harpy/common/workflow.py (10 hunks)
  • harpy/snakefiles/sv_naibr_phase.smk (5 hunks)
  • harpy/snakefiles/sv_naibr_pop_phase.smk (6 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
harpy/common/printing.py (1)
harpy/common/cli_types_params.py (1)
  • sanitize_shell (8-9)
harpy/common/workflow.py (5)
harpy/common/file_ops.py (3)
  • filepath (9-11)
  • gzip_file (18-23)
  • purge_empty_logs (25-32)
harpy/commands/environments.py (4)
  • container (67-78)
  • conda (32-64)
  • containerize (11-51)
  • container (100-111)
harpy/common/printing.py (1)
  • print_error (15-55)
harpy/common/launch.py (1)
  • launch_snakemake (95-307)
harpy/commands/sv.py (1)
  • naibr (128-190)
harpy/snakefiles/sv_naibr_phase.smk (1)
harpy/commands/sv.py (1)
  • naibr (128-190)
harpy/snakefiles/sv_naibr_pop_phase.smk (1)
harpy/common/summaries.py (2)
  • naibr_phase_pop (730-771)
  • phase (269-309)
harpy/commands/diagnose.py (2)
harpy/common/printing.py (2)
  • print_error (15-55)
  • print_shellcmd_simple (119-134)
harpy/common/file_ops.py (1)
  • safe_read (34-41)
🪛 Ruff (0.14.1)
harpy/commands/diagnose.py

106-106: Starting a process with a shell, possible injection detected

(S605)


159-159: Starting a process with a shell, possible injection detected

(S605)


163-163: Cannot use an escape sequence (backslash) in f-strings on Python 3.11 (syntax was added in Python 3.12)

(invalid-syntax)


167-167: Starting a process with a shell, possible injection detected

(S605)


170-170: Cannot use an escape sequence (backslash) in f-strings on Python 3.11 (syntax was added in Python 3.12)

(invalid-syntax)


176-176: Starting a process with a shell, possible injection detected

(S605)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: impute
  • GitHub Check: mpileup
  • GitHub Check: freebayes
  • GitHub Check: phase
  • GitHub Check: align BWA
  • GitHub Check: align strobe
  • GitHub Check: simulate variants
  • GitHub Check: qc
  • GitHub Check: sv naibr
  • GitHub Check: deconvolve
  • GitHub Check: sv leviathan
  • GitHub Check: assembly
  • GitHub Check: validate
🔇 Additional comments (5)
harpy/common/workflow.py (3)

25-46: LGTM! Clean API refactoring.

The addition of the container parameter to the Workflow constructor and making it an instance attribute (rather than a parameter to setup_snakemake) is a cleaner design that better reflects the container mode as a workflow-level configuration.


168-183: Improved error handling for missing snakefiles.

The rewritten fetch_snakefile method now provides clearer error messages and properly handles missing snakefile resources with explicit directory creation.


266-267: Code change is correct; conda recipe creation is properly gated in container mode.

Verification confirms the conditional at lines 266–267 correctly skips conda recipe creation when self.container is True. The only other call site in environments.py:53 is intentional, as it explicitly creates a non-container workflow where conda recipes should always be generated. No issues found.

harpy/common/launch.py (1)

95-95: LGTM! API change is consistent with Workflow refactoring.

The removal of the workflow parameter from launch_snakemake aligns with the changes in harpy/common/workflow.py where container is now an instance attribute rather than being passed through method parameters.

harpy/snakefiles/sv_naibr_pop_phase.smk (1)

108-118: LGTM! Cleaner alignment indexing pattern.

The updated index_alignments rule now uses a symlink approach to avoid duplicating alignment data, and explicitly declares both the BAM and BAI as temporary outputs. This is more efficient and clearer than the previous implementation.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
resources/changelog.md (1)

1-24: Changelog is missing documentation of major architectural changes.

The PR objectives describe significant infrastructure changes (replacing Snakemake containers with Pixi-driven multi-stage builds, updating the Workflow API, removing fetch_snakefile, adjusting CI dependencies), but none of these are documented in the changelog. This creates a gap between the PR's scope and what users/developers will see when reading release notes.

Add entries documenting:

  • Container refactoring: single large container → multiple smaller environment-tagged containers with Pixi multistage builds
  • Workflow API: updated to accept a container flag
  • Removed: fetch_snakefile function
  • Removed: singleton-separation script
  • Updated: CI and report dependency handling
  • Added: per-rule container directives across Snakefiles

Consider creating a separate subsection (e.g., "# infrastructure" or "# internal") to document these breaking/significant internal changes, or expand the "# changes" section to include them.

🧹 Nitpick comments (1)
resources/meta.yaml (1)

49-49: Loose version constraint on conda dependency.

Line 49 specifies conda >24.7 without an upper bound. This is permissive and could introduce issues if a future conda version introduces breaking changes. Consider pinning an upper bound (e.g., conda >=24.7,<25) to better protect against version drift.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6bb6335 and a651dbe.

📒 Files selected for processing (3)
  • resources/changelog.md (1 hunks)
  • resources/harpy.yaml (0 hunks)
  • resources/meta.yaml (1 hunks)
💤 Files with no reviewable changes (1)
  • resources/harpy.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
  • GitHub Check: sv naibr
  • GitHub Check: impute
  • GitHub Check: align BWA
  • GitHub Check: qc
  • GitHub Check: align strobe
  • GitHub Check: sv leviathan
  • GitHub Check: freebayes
  • GitHub Check: phase
  • GitHub Check: mpileup
  • GitHub Check: simulate variants
  • GitHub Check: deconvolve
  • GitHub Check: assembly
  • GitHub Check: validate
  • GitHub Check: demux meier2021
🔇 Additional comments (1)
resources/meta.yaml (1)

52-53: Dependency versions are compatible, but entry point discrepancy needs clarification.

Version upgrades are sound: rich-click 1.9.3 and snakemake-minimal 9.11 have minimal breaking changes and should work with the existing codebase.

However, the original concern about entry point removals is based on incomplete information. The four entry points in question—deconvolve_alignments, extract_bxtags, separate_singletons, and separate_validbx—are still defined in pyproject.toml (lines 30, 32, 40, 41) and therefore exist in the codebase. They are not present in meta.yaml (lines 20-37).

This indicates the conda recipe intentionally excludes these entry points from the conda package. Verify whether this omission is intentional (e.g., these scripts are not meant to be installed via conda) or if meta.yaml should be synced with pyproject.toml to include all defined entry points.

@pdimens pdimens merged commit 969f4df into main Oct 22, 2025
16 checks passed
@pdimens pdimens deleted the pixi_container branch October 22, 2025 17:44
@coderabbitai coderabbitai bot mentioned this pull request Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant