Conversation
📝 WalkthroughWalkthroughReplaces Snakemake-based container generation with a Pixi-driven per-environment Dockerfile generator and CI matrix builds; adds per-rule container directives across many Snakefiles; changes Workflow API to accept a container flag and removes fetch_snakefile; deletes a singleton-separation script; updates CI, packaging, and report loading behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Harpy_CLI
participant create_pixi
participant Pixi_CLI
participant FS
User->>Harpy_CLI: harpy containerize()
Harpy_CLI->>create_pixi: create_pixi_dockerfiles()
create_pixi->>FS: remove existing container/ directory
loop per environment
create_pixi->>FS: write container/{env}/Dockerfile & pixi.toml
create_pixi->>Pixi_CLI: pixi init (channels)
create_pixi->>Pixi_CLI: pixi add (deps -> container/{env}/pixi.toml)
create_pixi->>FS: remove container/.pixi cache
end
create_pixi-->>Harpy_CLI: done
sequenceDiagram
participant GitHub_Actions
participant Matrix_Job
participant setup-pixi
participant Buildx
participant Registry
GitHub_Actions->>Matrix_Job: trigger per-environment build (matrix)
Matrix_Job->>setup-pixi: prefix-dev/setup-pixi@v0.9.2 (activate)
setup-pixi-->>Matrix_Job: pixi available
Matrix_Job->>Buildx: docker/build-push (context: container/${{ matrix.env }})
Buildx->>Registry: push pdimens/harpy:$${{ matrix.env }}_{{ github.ref_name }}
Registry-->>Matrix_Job: image pushed
Matrix_Job-->>GitHub_Actions: job complete
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🔇 Additional comments (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
harpy/snakefiles/metassembly.smk (2)
1-1: Remove the global containerized declaration.This file still contains
containerized: "docker://pdimens/harpy:latest"at the top level, which is inconsistent with the PR's objective to move to per-rule container specifications. Other workflow files in this PR have removed their global container declarations.Apply this diff to remove the global declaration:
-containerized: "docker://pdimens/harpy:latest" -
108-112: Inconsistent container strategy in metassembly workflow.The
spades_assemblyrule still hascontainer: Nonewhile other assembly-related rules in this PR are being containerized. This file appears to be incompletely migrated to the per-rule container strategy. Consider either:
- Adding an appropriate container specification to this rule, or
- Documenting why this rule should remain uncontainerized
🧹 Nitpick comments (4)
harpy/common/create_pixi.py (3)
101-110: Consider adding error handling for file operations.The
reset_pixi_global()function assumes the~/.pixidirectory structure exists and usesglob.glob()which returns an empty list if no matches are found. While this is safe, consider adding explicit directory existence checks for better error messages if the expected structure is missing.def reset_pixi_global(): # remove any existing global packages pixidir = os.environ['HOME'] + "/.pixi" + if not os.path.exists(pixidir): + return # Nothing to clean up for f in glob.glob(pixidir + "/bin/*"):
140-142: Remove debug print statement.Line 142 contains a
print(global_manifest)statement that appears to be debug code. Consider removing it or converting it to proper logging._pix = subprocess.run("pixi global list".split(), capture_output = True, text = True) global_manifest = _pix.stdout.splitlines()[0].split()[-1].strip("\'") - print(global_manifest)
155-167: Capture stderr on subprocess failure for better debugging.When
subprocess.run()fails, the code prints_pix.stderrbut the subprocess was not configured to capture stderr. Addcapture_output=Trueorstderr=subprocess.PIPEto the subprocess calls to actually capture error messages.if env == "deconvolution": _pix = subprocess.run( - ["pixi", "global", "install", *_chan, "--environment", env, "--expose", "QuickDeconvolution", *deps] + ["pixi", "global", "install", *_chan, "--environment", env, "--expose", "QuickDeconvolution", *deps], + capture_output=True, text=True ) if _pix.returncode > 0: print(_pix.stderr) sys.exit(1) else: _pix = subprocess.run( - ["pixi", "global", "install", *_chan, "--environment", env, *deps] + ["pixi", "global", "install", *_chan, "--environment", env, *deps], + capture_output=True, text=True )harpy/commands/environments.py (1)
18-19: Consider removing or documenting the commented-out line.The commented-out call to
create_pixi_toml()should either be removed if it's no longer needed, or documented with a TODO/FIXME comment explaining why it's preserved.Apply this diff to remove the dead code:
create_pixi_dockerfiles() - #create_pixi_toml()Or if it's needed for future work, document it:
create_pixi_dockerfiles() - #create_pixi_toml() + # TODO: Re-enable create_pixi_toml() once [reason]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (29)
.github/workflows/createrelease.yml(3 hunks).github/workflows/tests.yml(1 hunks)harpy/commands/environments.py(1 hunks)harpy/common/conda.py(1 hunks)harpy/common/create_pixi.py(1 hunks)harpy/common/workflow.py(1 hunks)harpy/snakefiles/align_bwa.smk(5 hunks)harpy/snakefiles/align_strobe.smk(4 hunks)harpy/snakefiles/assembly.smk(5 hunks)harpy/snakefiles/deconvolve.smk(1 hunks)harpy/snakefiles/demultiplex_meier2021.smk(3 hunks)harpy/snakefiles/environments.smk(1 hunks)harpy/snakefiles/impute.smk(0 hunks)harpy/snakefiles/metassembly.smk(1 hunks)harpy/snakefiles/phase.smk(4 hunks)harpy/snakefiles/qc.smk(3 hunks)harpy/snakefiles/simulate_snpindel.smk(2 hunks)harpy/snakefiles/simulate_variants.smk(2 hunks)harpy/snakefiles/snp_freebayes.smk(2 hunks)harpy/snakefiles/snp_mpileup.smk(1 hunks)harpy/snakefiles/sv_leviathan.smk(4 hunks)harpy/snakefiles/sv_leviathan_pop.smk(5 hunks)harpy/snakefiles/sv_naibr.smk(2 hunks)harpy/snakefiles/sv_naibr_phase.smk(3 hunks)harpy/snakefiles/sv_naibr_pop.smk(3 hunks)harpy/snakefiles/sv_naibr_pop_phase.smk(5 hunks)harpy/snakefiles/validate_bam.smk(1 hunks)harpy/snakefiles/validate_fastq.smk(1 hunks)resources/Dockerfile(1 hunks)
💤 Files with no reviewable changes (1)
- harpy/snakefiles/impute.smk
🧰 Additional context used
🧬 Code graph analysis (13)
harpy/snakefiles/simulate_variants.smk (1)
harpy/common/summaries.py (2)
simulate_snpindel(331-396)simulate_variants(407-448)
harpy/common/workflow.py (1)
harpy/commands/environments.py (1)
container(68-79)
harpy/common/conda.py (1)
harpy/common/summaries.py (2)
simulate_variants(407-448)simulate_snpindel(331-396)
harpy/snakefiles/snp_mpileup.smk (1)
harpy/common/summaries.py (1)
snp_mpileup(479-507)
harpy/snakefiles/demultiplex_meier2021.smk (1)
harpy/common/summaries.py (1)
demultiplex_meier2021(148-181)
harpy/common/create_pixi.py (1)
harpy/commands/environments.py (1)
deps(22-29)
harpy/snakefiles/metassembly.smk (1)
harpy/common/summaries.py (2)
metassembly(231-267)assembly(78-124)
harpy/snakefiles/assembly.smk (1)
harpy/common/summaries.py (2)
metassembly(231-267)assembly(78-124)
harpy/snakefiles/sv_leviathan_pop.smk (1)
harpy/common/summaries.py (1)
leviathan(539-571)
harpy/snakefiles/sv_naibr_pop.smk (1)
harpy/common/summaries.py (1)
naibr_pop(656-691)
harpy/snakefiles/simulate_snpindel.smk (1)
harpy/commands/simulate_variants.py (2)
snpindel(44-121)cnv(232-296)
harpy/commands/environments.py (1)
harpy/common/create_pixi.py (2)
create_pixi_dockerfiles(77-98)create_pixi_toml(133-172)
harpy/snakefiles/phase.smk (1)
harpy/common/summaries.py (1)
phase(269-309)
🪛 actionlint (1.7.8)
.github/workflows/createrelease.yml
91-91: property "version" is not defined in object type {env: string}
(expression)
🪛 Ruff (0.14.1)
harpy/common/create_pixi.py
140-140: subprocess call: check for execution of untrusted input
(S603)
155-155: subprocess call: check for execution of untrusted input
(S603)
156-156: Starting a process with a partial executable path
(S607)
162-162: subprocess call: check for execution of untrusted input
(S603)
163-163: Starting a process with a partial executable path
(S607)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: assembly
- GitHub Check: deconvolve
- GitHub Check: sv naibr
- GitHub Check: mpileup
- GitHub Check: align BWA
- GitHub Check: freebayes
- GitHub Check: validate
- GitHub Check: phase
- GitHub Check: sv leviathan
- GitHub Check: align strobe
- GitHub Check: impute
- GitHub Check: qc
- GitHub Check: simulate variants
- GitHub Check: demux meier2021
🔇 Additional comments (34)
harpy/snakefiles/validate_bam.smk (1)
73-74: LGTM! Container specification added for report generation.The addition of the
report_latestcontainer for thecreate_reportrule aligns with the PR's containerization strategy. The rule retains the conda environment specification as a fallback.harpy/snakefiles/simulate_snpindel.smk (2)
97-98: LGTM! Containerization for simulation rules.The addition of
simulations_latestcontainer specifications to bothsimulate_haploidandsimulate_diploidrules is consistent with the PR's per-rule containerization approach.
169-170: LGTM! Consistent container specification.The
simulate_diploidrule correctly uses the samesimulations_latestcontainer assimulate_haploid, ensuring consistent execution environments..github/workflows/tests.yml (1)
614-615: LGTM! Disk space optimization for assembly tests.Adding the disk cleanup step before assembly tests is a sensible optimization, as assembly operations can be disk-intensive. The
jlumbroso/free-disk-spaceaction is a well-maintained solution for this purpose.harpy/snakefiles/validate_fastq.smk (1)
88-89: LGTM! Consistent containerization for report generation.The container specification matches the identical change in
validate_bam.smk, ensuring consistent report generation across validation workflows.harpy/snakefiles/sv_naibr_pop_phase.smk (3)
140-141: LGTM! Appropriate containerization for phasing step.The
phase_alignmentsrule correctly uses thephase_latestcontainer, aligning with its phasing functionality.
233-234: LGTM! Appropriate containerization for variant calling.The
call_variantsrule correctly uses thevariants_latestcontainer, matching its variant calling functionality.
317-318: LGTM! Consistent containerization for reporting rules.Both
sample_reportsandaggregate_reportrules correctly use thereport_latestcontainer, ensuring consistent report generation across the workflow.Also applies to: 346-347
harpy/common/workflow.py (1)
84-84: I'll perform a broader search of the codebase and check Snakemake documentation to verify this concern.<!-- [scratchpad_update]
[observations]
Web search results show:
- Snakemake documentation references --software-deployment-method as a command-line option
- Results show examples like "--software-deployment-method conda" and "--software-deployment-method conda apptainer" (list of two values)
- Result 5 mentions: "deprecates --use-conda and --use-singularity in favor of --software-deployment-method conda or --software-deployment-method apptainer and --software-deployment-method conda apptainer"
- Result 9 example: "snakemake --software-deployment-method conda apptainer" (space-separated values)
- Result 2 shows "--software-deployment-method conda" (single value)
- No examples show YAML profile format with this parameter yet
[analysis]
- Snakemake documentation shows both single values (e.g., "conda") and multiple values (e.g., "conda apptainer") are accepted
- Need to verify YAML profile format specifically and whether string vs list format is valid in YAML profiles
- Need to look at the actual code in harpy/common/workflow.py to see context
[pending]
- Check harpy/common/workflow.py context at line 84
- Search for other software-deployment-method usage in this codebase
- Verify YAML profile format handling
[scratchpad_end] -->Now let me search the codebase to understand the context and find other usages of this parameter:<!-- [scratchpad_update]
[observations]
Still no direct YAML profile example showing software-deployment-method format
- Web results show profile YAML format: "key: value" pairs
- Example: "cluster: qsub", "use-conda: True", etc.
- Command-line examples show both single and multiple values: "--software-deployment-method conda" and "--software-deployment-method conda apptainer"
- Documentation shows parameter can accept both formats on command line, but profile YAML format not explicitly shown
[pending]
- Need to check harpy/common/workflow.py file to see full context
- Need to search codebase for any existing software-deployment-method usage
- Verify if string or list format is expected in YAML profiles by checking examples
[actions]
- Search harpy/common/workflow.py for context
- Search entire codebase for software-deployment-method
- Look for profile YAML files or examples
[scratchpad_end] -->Now let me search the repository for the actual file and context:<!-- [scratchpad_update]
[observations]
Key finding in result index 22-11:
"--software-deployment-method {apptainer,conda,env-modules} [{apptainer,conda,env-modules} ...]"
This explicitly shows the CLI accepts: multiple values OR single values (note the optional part in square brackets)Results also show:
- MegaLinter config shows the syntax includes optional multiple values
- Snakemake accepts "conda apptainer" (two values space-separated on command line)
- Snakemake accepts "conda" (single value)
- Snakemake accepts "apptainer" (single value)
Still need to see how this translates to YAML profile format.
From result 14-4, 14-15, 14-25: Profile YAML follows command-line format, but unclear if list notation needed[analysis]
The command-line format shows space-separated values: --software-deployment-method conda apptainer
The question is: should the YAML profile use:
- "software-deployment-method: apptainer" (string)
- "software-deployment-method: [apptainer]" (list)
- Something else?
The code review comment mentions changing from list ["conda", "apptainer"] to a string "apptainer". Need to verify what format Snakemake profiles actually expect.
[scratchpad_end] -->Based on my web search findings, Snakemake's
--software-deployment-methodparameter accepts both single and multiple values like{apptainer,conda,env-modules} [{apptainer,conda,env-modules} ...]. The documentation examples show both formats work on the command line: use--software-deployment-method condaor--software-deployment-method apptaineror--software-deployment-method conda apptainer.However, I couldn't find definitive documentation showing the exact YAML profile format for this parameter (string vs. list), and the codebase search was inconclusive. Let me provide the rewritten review comment based on what was verified:
Verify that Snakemake's profile configuration accepts string format for
software-deployment-method.The
--software-deployment-methodparameter accepts single values like "conda" or "apptainer" as well as multiple values like "conda apptainer". However, YAML profile syntax for this parameter is not explicitly documented. Verify that the string value"apptainer"(or"conda" if not container-only) is correctly accepted in the profile's YAML configuration, not as a list. If Snakemake requires a list format in YAML profiles, update the code accordingly.harpy/common/conda.py (1)
65-67: I need to gather more context about the conda environment and verify the actual simulation workflows.Now let me examine the conda.py file and check for active simulation workflows:Based on my analysis of the shell script output and web search results, I now have enough information to provide a comprehensive response. Let me generate the final rewritten review comment.No simulation workflows depend on mimick; the removed linkedreads command was deprecated with sys.exit(1).
The shell script found that mimick is only referenced in the deprecated
linkedreadscommand inharpy/commands/simulate.py, which is markeddeprecated=Trueand immediately exits withsys.exit(1). Mimick simulates linked-read FASTQ data from various chemistries, allowing simulation of haplotypes and barcodes, while simuG is a tool for simulating genomic variants (SNPs, INDELs, CNVs, inversions, translocations), making it a unique general-purpose genome simulator. These are distinct tools for different purposes. Since the linkedreads command is non-functional and thesimulationsenvironment (lines 65-67) contains onlysimug>1.0.0(which serves genome variant simulation, not linked-read simulation), the removal of mimick from this environment will not break active simulation workflows.harpy/snakefiles/deconvolve.smk (2)
56-57: LGTM: Per-rule container specification added.The
deconvolverule now correctly specifies its own container, aligning with the shift to environment-specific containers.
61-69: ****The review comment incorrectly assumes that
extract_forwardrule lacks access toseqtk.seqtk is available as a global environment dependency (defined in resources/meta.yaml and resources/harpy.yaml). The
extract_forwardrule requires no explicit container or conda directive to access it—rules without such specifications use the global environment by default.This pattern is consistent throughout the codebase: the
interleaverule in the same file explicitly specifiescontainer: Nonewhile usingseqtk mergepe, and many other snakefiles useseqtkwithout explicit container/conda specifications. The absence of a container directive does not prevent access to globally-available tools.Likely an incorrect or invalid review comment.
harpy/snakefiles/assembly.smk (3)
51-52: LGTM: Assembly rules correctly containerized.The assembly-related rules (
cloudspades,scaffolding,QUAST_assessment,BUSCO_analysis) now appropriately use theassembly_latestcontainer, aligning with the per-environment containerization strategy.Also applies to: 106-107, 132-133, 154-155
172-173: LGTM: Correct container for reporting.The
build_reportrule appropriately usesqc_latestsince it runs MultiQC for quality control reporting.
60-67: Verify seqtk availability in the execution environment.The
interleave_fastqrule no longer specifies a container or conda environment. Ensure thatseqtkis available in the default execution context.harpy/snakefiles/simulate_variants.smk (2)
62-63: LGTM: Simulation rules correctly containerized.The
simulate_haploidandsimulate_diploidrules now appropriately use thesimulations_latestcontainer.Also applies to: 119-120
39-45: Verify bcftools availability for convert_vcf rule.The
convert_vcfrule (lines 39-45) no longer specifies a container or conda environment. Ensure thatbcftoolsis available in the default execution context.harpy/snakefiles/phase.smk (2)
131-132: LGTM: Phase rules correctly containerized.The HapCut2-related rules (
extract_hairs,link_fragments,phase) now appropriately use thephase_latestcontainer.Also applies to: 152-153, 172-173
268-269: LGTM: Correct container for Quarto reporting.The
phase_reportrule appropriately usesreport_latestfor Quarto-based report generation.harpy/snakefiles/demultiplex_meier2021.smk (3)
71-72: LGTM: Demultiplexing rule correctly containerized.The
demultiplexrule appropriately uses thedemultiplex_latestcontainer for the dmox tool.
93-94: LGTM: QC rules correctly containerized.The
assess_qualityandquality_reportrules appropriately use theqc_latestcontainer for falco and MultiQC tools.Also applies to: 150-151
38-41: Verify haplotag_acbd availability in the host environment.The
barcode_segmentsrule explicitly setscontainer: None, meaning it will execute in the host environment. Ensure that thehaplotag_acbdcommand is available in the execution context.harpy/snakefiles/snp_mpileup.smk (1)
228-229: LGTM: Correct container for variant reporting.The
variant_reportrule appropriately usesreport_latestfor Quarto-based report generation, consistent with other reporting rules in the codebase.harpy/common/create_pixi.py (2)
77-98: LGTM: Dockerfile generation logic is sound.The
create_pixi_dockerfiles()function correctly generates per-environment Dockerfiles with appropriate channel configurations and special handling for the deconvolution environment.
140-167: Static analysis security warnings are false positives.The static analysis tool flags subprocess usage as potentially unsafe (S603, S607). However, in this context:
- The
pixicommand is a legitimate build tool- All command arguments come from the hardcoded
environdictionary (lines 9-75)- No user input is passed to subprocess
These warnings can be safely ignored for this use case.
harpy/snakefiles/metassembly.smk (1)
82-86: Verify that theassembly_latestcontainer exposes the executable asmetaspadesnotmetaspades.py.The codebase shows an inconsistency: the
error_correctionrule usesmetaspades(line 86), while thespades_assemblyrule usesmetaspades.py(line 112). SPAdes documentation consistently referencesmetaspades.pyas the standard executable. Without access to the container image configuration, the executable name change cannot be verified through the repository alone. Confirm that thedocker://pdimens/harpy:assembly_latestcontainer either wrapsmetaspades.pyasmetaspadesor installs SPAdes with that alternative naming.resources/Dockerfile (1)
1-7: LGTM! Clean pixi-based Dockerfile.The Dockerfile correctly uses the pixi base image, copies the required manifest, and cleans up the cache to reduce image size.
harpy/snakefiles/align_strobe.smk (1)
85-86: LGTM! Per-rule containers properly specified.The container specifications follow a consistent pattern and align with the PR's goal of using smaller, environment-specific containers. The
_latesttags will be replaced with version tags during the release process.Also applies to: 224-225, 281-282, 301-302
.github/workflows/createrelease.yml (1)
70-78: Verify the shell directive placement with setup-pixi action.The
shell: micromamba-shell {0}directive appears to be set on auses:action step (lines 71-76), but shell directives are only applicable torun:steps. The shell directive should likely be moved to line 78 whererun: harpy containerizeis executed, or the step structure needs adjustment.Please verify the intended structure. If the shell should apply to the
runcommand, consider restructuring:- name: Setup Pixi uses: prefix-dev/setup-pixi@v0.9.2 with: pixi-version: v0.57.0 cache: true auth-host: prefix.dev auth-token: ${{ secrets.PREFIX_DEV_TOKEN }} - name: Recreate container shell: micromamba-shell {0} run: harpy containerizeharpy/snakefiles/align_bwa.smk (1)
48-49: LGTM! Container specifications consistent with broader changes.The per-rule container approach is correctly implemented and follows the same pattern as other workflow files in this PR.
Also applies to: 106-107, 245-246, 302-303, 322-323
harpy/snakefiles/qc.smk (1)
60-61: LGTM! QC workflow properly containerized.The container specifications for QC-related rules follow the established pattern and align with the per-environment container strategy.
Also applies to: 102-103, 128-129
harpy/snakefiles/sv_leviathan.smk (1)
50-51: LGTM! Structural variant workflow properly containerized.The container specifications correctly segregate alignment, variant calling, and reporting operations into their respective environment-specific containers.
Also applies to: 71-72, 106-107, 200-201
harpy/snakefiles/sv_naibr_phase.smk (1)
117-118: LGTM! Phased variant calling workflow properly containerized.The container specifications correctly include the phasing environment and follow the established pattern for variant calling and reporting.
Also applies to: 180-181, 264-265
harpy/snakefiles/sv_leviathan_pop.smk (1)
119-120: Based on the script output, the review comment's assessment is accurate and well-justified:Verified facts:
- bwa index produces files with extensions amb, ann, bwt, pac and sa
sv_leviathan_pop.smkoutputs exactly these file extensions (lines 112:.ann,.bwt,.pac,.sa,.amb), confirming it runs bwa indexsv_naibr_pop.smkandsnp_freebayes.smkuse onlyseqtk seqandsamtools faidxwith no bwa operations- Container difference is present: sv_leviathan_pop has
align_latest, while the other three workflows have no container directiveConclusion:
The container assignment in sv_leviathan_pop is intentional and justified. Thealign_latestcontainer is correctly applied only to the workflow that requires bwa indexing. The other workflows legitimately require no such container since they only use seqtk and samtools, which don't demand the align environment. This represents correct, differentiated container usage across workflows with distinct tool requirements.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
.github/workflows/createrelease.yml (1)
91-91: Fix the template expression syntax error.Line 91 still contains a double
$$which is incorrect GitHub Actions YAML syntax. Template expressions should use a single$. This matches the issue flagged in the previous review and needs to be corrected.Apply this diff:
- tags: pdimens/harpy:$${{ matrix.env }}_{{ github.ref_name }} + tags: pdimens/harpy:${{ matrix.env }}_{{ github.ref_name }}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/createrelease.yml(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: align strobe
- GitHub Check: qc
- GitHub Check: assembly
- GitHub Check: simulate variants
- GitHub Check: sv naibr
- GitHub Check: impute
- GitHub Check: align BWA
- GitHub Check: mpileup
- GitHub Check: freebayes
- GitHub Check: sv leviathan
- GitHub Check: phase
- GitHub Check: validate
- GitHub Check: demux meier2021
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (2)
.github/workflows/createrelease.yml (2)
70-78: Critical: Invalid workflow step structure mixing action and command.Lines 77-78 incorrectly place
shellandrunkeys at the same indentation level as theusesaction. GitHub Actions does not allow mixing action execution (uses) with shell command execution (run) in the same step. This will cause a workflow syntax error.Apply this diff to split into separate steps:
- name: Recreate containers uses: prefix-dev/setup-pixi@v0.9.2 with: pixi-version: v0.57.0 cache: true auth-host: prefix.dev auth-token: ${{ secrets.PREFIX_DEV_TOKEN }} - shell: micromamba-shell {0} - run: harpy containerize + - name: Run harpy containerize + shell: micromamba-shell {0} + run: harpy containerize
86-91: Critical: Fix template syntax errors in image tag.Line 91 has two syntax errors:
- Double
$$before the first template expression (should be single$)- Missing
$before the second template expression{{ github.ref_name }}These will cause incorrect tag formatting or workflow failure.
Apply this diff to fix the tag syntax:
- name: Build and push uses: docker/build-push-action@v6 with: context: .container/${{ matrix.env }} push: true - tags: pdimens/harpy:$${{ matrix.env }}_{{ github.ref_name }} + tags: pdimens/harpy:${{ matrix.env }}_${{ github.ref_name }}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/createrelease.yml(3 hunks)harpy/commands/environments.py(1 hunks)harpy/common/create_pixi.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
harpy/commands/environments.py (1)
harpy/common/create_pixi.py (1)
create_pixi_dockerfiles(99-117)
🪛 Ruff (0.14.1)
harpy/common/create_pixi.py
110-110: subprocess call: check for execution of untrusted input
(S603)
112-112: subprocess call: check for execution of untrusted input
(S603)
114-114: subprocess call: check for execution of untrusted input
(S603)
115-115: Starting a process with a partial executable path
(S607)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: sv naibr
- GitHub Check: freebayes
- GitHub Check: assembly
- GitHub Check: align strobe
- GitHub Check: sv leviathan
- GitHub Check: impute
- GitHub Check: mpileup
- GitHub Check: align BWA
- GitHub Check: phase
- GitHub Check: simulate variants
- GitHub Check: qc
- GitHub Check: demux meier2021
- GitHub Check: validate
🔇 Additional comments (4)
harpy/commands/environments.py (1)
7-7: LGTM - Clean import for new architecture.The import correctly references the new pixi-based container generation function.
harpy/common/create_pixi.py (1)
7-73: LGTM - Well-structured environment definitions.The environment-to-dependency mappings are clear and align with the matrix strategy in the CI workflow.
.github/workflows/createrelease.yml (2)
9-38: LGTM - Clean tarball generation job.The renamed job properly versions the source files and creates a release artifact.
40-69: LGTM - Matrix strategy aligns with environment definitions.The matrix values match the environment keys defined in
harpy/common/create_pixi.py, ensuring all 11 environments get their own containers.
There was a problem hiding this comment.
Actionable comments posted: 4
♻️ Duplicate comments (1)
.github/workflows/createrelease.yml (1)
76-76: Fix the double dollar sign in the tag template.The tag contains
$${{ matrix.env }}with a double$$, which will produce a literal$in the output instead of interpolating the variable. This issue was previously flagged but remains unfixed.Apply this diff to fix the tag:
- tags: pdimens/harpy:$${{ matrix.env }}_{{ github.ref_name }} + tags: pdimens/harpy:${{ matrix.env }}_${{ github.ref_name }}Note: GitHub Actions syntax requires single
$for variable interpolation. Also fixed the second interpolation which was missing the opening$.
🧹 Nitpick comments (1)
.github/workflows/tests.yml (1)
68-72: Consider enabling Pixi cache to improve CI performance.The Pixi setup step has
cache: falsein all jobs. Unless intentionally disabled for testing, enabling the cache could significantly speed up workflow runs.Apply this diff if caching is desired:
- name: Install Harpy uses: prefix-dev/setup-pixi@v0.9.2 with: pixi-version: v0.56.0 - cache: false + cache: true activate-environment: trueApply the same change to all other jobs (lines 89-93, 115-119, 136-140, 157-161, 178-182, 199-203, 224-228, 249-253, 280-284, 312-316, 338-342, 374-378, 418-422, 449-453).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.github/workflows/createrelease.yml(2 hunks).github/workflows/tests.yml(17 hunks).gitignore(1 hunks)harpy/snakefiles/metassembly.smk(2 hunks)pixi.toml(1 hunks)
🔇 Additional comments (3)
.gitignore (1)
38-38: LGTM! Appropriate specificity for Pixi files.The change from the broad
pixi*pattern to the specificpixi.lockcorrectly allows trackingpixi.tomlwhile ignoring generated lock files..github/workflows/tests.yml (1)
423-424: Verify the disk space action is necessary.The "Clear Space" step was added specifically to the assembly job. Confirm that disk space was actually a constraint in this job.
harpy/snakefiles/metassembly.smk (1)
1-1: Clarify mixed container execution strategy.The file has inconsistent container declarations:
- Line 1: Global
containerized: "docker://pdimens/harpy:latest"- Line 84:
error_correctionexplicitly usesdocker://pdimens/harpy:assembly_latest- Line 110:
spades_assemblyexplicitly setscontainer: NoneThis creates confusion about which container each rule uses. The
spades_assemblyrule appears to intentionally opt out of containerization whileerror_correctionuses a specific container.Verify the intended container usage:
- Should
spades_assemblyrun without a container, or is this an oversight?- Why does
error_correctionuseassembly_latestinstead of the globallatest?- Should there be consistency between the two rules that use the same underlying tool (
metaspades)?Also applies to: 82-86, 108-112
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
harpy/snakefiles/metassembly.smk (1)
105-110: Fix missing conda environment configuration:spades_assemblyrule references non-existentenvs/assembly.yamlwithcontainer: None.The rule specifies
container: None, which requires a valid local conda environment. However, the referencedenvs/assembly.yamlfile does not exist in the repository. This will cause the rule to fail at runtime.Compare this to the
spades_assemblyrule inassembly.smk(line 49-50), which uses the same conda reference but has a valid Docker container specified, avoiding the local conda requirement.Options to fix:
- Add a valid Docker container image (e.g.,
"docker://pdimens/harpy:assembly_latest") to match the pattern used inassembly.smk, OR- Create the missing
envs/assembly.yamlconda environment fileharpy/commands/demultiplex.py (2)
37-40: Click argument name mismatch will break invocation.@click.argument uses names 'R12_FQ'/'I12_FQ' but the function expects r12_fq/i12_fq. Click matches by parameter name; this will error at runtime.
Apply either approach; preferred is to make argument names lowercase to match the function:
-@click.argument('R12_FQ', required=True, type=FASTQfile(dir_ok= False), nargs=2) -@click.argument('I12_FQ', required=True, type=FASTQfile(dir_ok= False), nargs=2) +@click.argument('r12_fq', required=True, type=FASTQfile(dir_ok= False), nargs=2) +@click.argument('i12_fq', required=True, type=FASTQfile(dir_ok= False), nargs=2)
44-49: Doc typo: index reads.Text says “R1, R2, I2, and I2”; should be “R1, R2, I1, and I2”.
-Use the R1, R2, I2, and I2 FASTQ files provided by the sequencing facility as inputs after the options and schema (4 files, in that exact order). +Use the R1, R2, I1, and I2 FASTQ files provided by the sequencing facility as inputs after the options and schema (4 files, in that exact order).harpy/commands/snp.py (2)
142-149: Same host-only make_windows issue in mpileup path.Please apply the same refactor here: run via a Snakemake rule in the container image or use a Python fallback.
69-75: Refactor to eliminate host-dependency on make_windows CLI invocation—integrate into Snakemake pipeline or call Python function directly.The concern is confirmed. The code invokes
make_windowsviaos.system()at lines 71 and 144 inharpy/commands/snp.py, requiring the binary in PATH on the host and breaking containerization. Since a Python implementation ofmake_windowsalready exists atharpy/scripts/make_windows.py, the best approach is to move region generation into the Snakemake pipeline (executed within the container image) or refactor the Python function to accept parameters directly rather than parsing sys.argv, allowing it to be called from Python code without shell invocation.Two locations require updates:
- Line 71 (freebayes function)
- Line 144 (mpileup function)
♻️ Duplicate comments (5)
.github/workflows/createrelease.yml (3)
47-47: Fix double dollar sign in tag reference.The tag expression uses
$${{ matrix.env }}but should use a single$:${{ matrix.env }}. Double dollar signs will cause the Docker tag to be malformed.Apply this diff:
- tags: pdimens/harpy:$${{ matrix.env }}_{{ github.ref_name }} + tags: pdimens/harpy:${{ matrix.env }}_{{ github.ref_name }}
17-17: Remove invalid matrix entries:stitchis a conda environment andmetassemblydoes not have a dedicated container.The matrix references 11 environments, but only 9 correspond to actual containers built by the harpy containerize flow:
stitchis a conda environment (envs/stitch.yaml), not a container—remove it from the matrixmetassemblydoes not use ametassembly_latestcontainer; metassembly.smk referencesassembly_latestinstead—remove or verify if it should build a separate metassembly containerUpdate the matrix to:
[align, assembly, deconvolution, demultiplex, phase, qc, report, simulations, variants]Apply this diff:
- env: [align, assembly, deconvolution, demultiplex, metassembly, phase, qc, report, simulations, stitch, variants] + env: [align, assembly, deconvolution, demultiplex, phase, qc, report, simulations, variants]
62-65: Update sed pattern to handle both_latestsuffix and standalonelatesttags.The current sed command only replaces
_latest(e.g.,report_latest), but some snakefile container references use a standalonelatesttag (e.g., metassembly.smk:1). Both patterns must be replaced to ensure all container references are versioned consistently.Apply this diff to add a second substitution:
for i in harpy/snakefiles/*.smk; do - sed -i "s/_latest/_${{ github.ref_name }}/g" $i + sed -i "s/_latest/_${{ github.ref_name }}/g; s/\blatest\b/${{ github.ref_name }}/g" $i doneharpy/common/create_pixi.py (2)
75-97: Update pixi version to match CI workflow.The Dockerfile uses
pixi:0.56.0while the CI workflow expectsv0.57.0. This version inconsistency can cause behavioral differences between local and CI environments.Apply this diff to align versions:
-FROM ghcr.io/prefix-dev/pixi:0.56.0 AS build +FROM ghcr.io/prefix-dev/pixi:0.57.0 AS build
104-124: Critical: Path mismatch with CI workflow will cause build failures.The function creates directories under
container/but the CI workflow (createrelease.yml) expects.container/${{ matrix.env }}. This mismatch will cause Docker builds to fail with "context not found" errors.Apply this diff to fix the path:
- shutil.rmtree("container", ignore_errors=True) + shutil.rmtree(".container", ignore_errors=True) for env,deps in environ.items(): - os.makedirs(f"container/{env}", exist_ok=True) + os.makedirs(f".container/{env}", exist_ok=True) - with open(f"container/{env}/Dockerfile", "w") as dockerfile: + with open(f".container/{env}/Dockerfile", "w") as dockerfile: dockerfile.write(dockerfile_text) if env == "report": subprocess.run( - f"pixi init container/{env} -c conda-forge -c r".split(), + f"pixi init .container/{env} -c conda-forge -c r".split(), check = True ) else: subprocess.run( - f"pixi init container/{env} -c conda-forge -c bioconda".split(), + f"pixi init .container/{env} -c conda-forge -c bioconda".split(), check = True ) subprocess.run( - ["pixi", "add", "--no-progress", "--manifest-path", f"container/{env}/pixi.toml", *deps], + ["pixi", "add", "--no-progress", "--manifest-path", f".container/{env}/pixi.toml", *deps], check = True ) - shutil.rmtree("container/.pixi", ignore_errors=True) + shutil.rmtree(".container/.pixi", ignore_errors=True)
🧹 Nitpick comments (1)
harpy/commands/phase.py (1)
29-29: Type mismatch: quiet is an int in CLI.click defines quiet as IntRange(0,2), but Workflow.quiet is annotated bool. Recommend updating the annotation to int to reflect levels. As per the shared snippet in harpy/common/workflow.py.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (21)
.github/workflows/createrelease.yml(3 hunks)harpy/commands/align.py(2 hunks)harpy/commands/assembly.py(1 hunks)harpy/commands/deconvolve.py(1 hunks)harpy/commands/demultiplex.py(1 hunks)harpy/commands/impute.py(1 hunks)harpy/commands/metassembly.py(1 hunks)harpy/commands/phase.py(1 hunks)harpy/commands/qc.py(1 hunks)harpy/commands/resume.py(3 hunks)harpy/commands/snp.py(2 hunks)harpy/commands/sv.py(2 hunks)harpy/commands/validate.py(2 hunks)harpy/common/create_pixi.py(1 hunks)harpy/common/file_ops.py(0 hunks)harpy/common/workflow.py(9 hunks)harpy/scripts/separate_singletons.py(0 hunks)harpy/snakefiles/metassembly.smk(3 hunks)pixi.toml(1 hunks)resources/harpy.yaml(1 hunks)resources/meta.yaml(1 hunks)
💤 Files with no reviewable changes (2)
- harpy/common/file_ops.py
- harpy/scripts/separate_singletons.py
🚧 Files skipped from review as they are similar to previous changes (1)
- pixi.toml
🧰 Additional context used
🧬 Code graph analysis (16)
harpy/commands/phase.py (1)
harpy/common/workflow.py (2)
Workflow(21-294)setup_snakemake(61-108)
harpy/commands/qc.py (2)
harpy/common/workflow.py (2)
Workflow(21-294)setup_snakemake(61-108)harpy/commands/environments.py (2)
container(67-78)container(100-111)
harpy/commands/sv.py (1)
harpy/common/workflow.py (2)
Workflow(21-294)setup_snakemake(61-108)
harpy/commands/deconvolve.py (1)
harpy/common/workflow.py (2)
Workflow(21-294)setup_snakemake(61-108)
harpy/commands/snp.py (2)
harpy/common/workflow.py (2)
Workflow(21-294)setup_snakemake(61-108)harpy/commands/environments.py (2)
container(67-78)container(100-111)
harpy/commands/metassembly.py (2)
harpy/common/workflow.py (2)
Workflow(21-294)setup_snakemake(61-108)harpy/commands/environments.py (3)
container(67-78)conda(32-64)container(100-111)
harpy/snakefiles/metassembly.smk (1)
harpy/common/summaries.py (2)
metassembly(231-267)assembly(78-124)
harpy/commands/align.py (1)
harpy/common/workflow.py (2)
Workflow(21-294)setup_snakemake(61-108)
harpy/common/create_pixi.py (1)
harpy/commands/environments.py (2)
containerize(11-51)container(100-111)
harpy/commands/resume.py (2)
harpy/common/conda.py (1)
check_environments(108-130)harpy/common/workflow.py (1)
Workflow(21-294)
harpy/commands/validate.py (1)
harpy/common/workflow.py (2)
Workflow(21-294)setup_snakemake(61-108)
harpy/commands/assembly.py (1)
harpy/common/workflow.py (2)
Workflow(21-294)setup_snakemake(61-108)
harpy/commands/demultiplex.py (1)
harpy/common/workflow.py (2)
Workflow(21-294)setup_snakemake(61-108)
.github/workflows/createrelease.yml (1)
harpy/commands/environments.py (1)
container(100-111)
harpy/commands/impute.py (1)
harpy/common/workflow.py (2)
Workflow(21-294)setup_snakemake(61-108)
harpy/common/workflow.py (4)
harpy/common/file_ops.py (3)
filepath(9-11)gzip_file(18-23)purge_empty_logs(25-32)harpy/commands/environments.py (4)
container(67-78)conda(32-64)containerize(11-51)container(100-111)harpy/common/printing.py (1)
print_error(13-53)harpy/common/conda.py (1)
create_conda_recipes(10-106)
🪛 Ruff (0.14.1)
harpy/common/create_pixi.py
110-110: subprocess call: check for execution of untrusted input
(S603)
115-115: subprocess call: check for execution of untrusted input
(S603)
120-120: subprocess call: check for execution of untrusted input
(S603)
121-121: Starting a process with a partial executable path
(S607)
🔇 Additional comments (27)
resources/harpy.yaml (1)
12-12: Clarify the rationale for therich-clickversion constraint change.The change from pinned
rich-click =1.9to>=1.9.3introduces flexibility in dependency resolution but appears orthogonal to the PR's core objective of replacing Snakemake containers with smaller pixi-driven ones. Allowing open upper bounds on dependencies can introduce unexpected breaking changes in future updates.Consider:
- Is this change intentional or incidental?
- Does
rich-click >=1.9.3introduce any known compatibility issues?- Should you pin a reasonable upper bound (e.g.,
rich-click >=1.9.3,<2.0) to prevent major version breaks?.github/workflows/createrelease.yml (2)
23-28: Verify pixi setup correctly installs harpy.The Install Harpy step uses
prefix-dev/setup-pixi@v0.9.2withactivate-environment: true, but there is no explicit call to install harpy after pixi is set up. Ensure that harpy is installed in the pixi environment (either via pixi.toml or explicitpixi installcall) so that the subsequentharpy containerizecommand succeeds.
26-26: I need to clarify which pixi product we're dealing with. Based on the workflow file context (containerization, pixi-dependent tasks), this appears to be the pixi CLI tool from prefix.dev, not PixiJS. Let me search for information specific to the pixi CLI package manager:Based on the search results, v0.57.0 is the latest stable version of pixi (released on 2025-10-20), making the workflow's use of v0.56.0 a downgrade to an older patch version.
Let me search for specific details about what changed in v0.57.0 to understand why a downgrade might be necessary:
Use the latest pixi version v0.57.0 instead of downgrading to v0.56.0.
v0.57.0 is the latest stable release of pixi, and the workflow is currently downgrading to v0.56.0. No significant breaking changes between these two patch versions were documented. Unless there's a specific compatibility issue with the harpy containerize workflow that requires v0.56.0, upgrading to v0.57.0 would be beneficial for stability and access to recent improvements.
resources/meta.yaml (1)
52-52: LGTM: Dependency version updated.The rich-click version bump to >=1.9.3 is appropriate and aligns with keeping dependencies current.
harpy/common/create_pixi.py (1)
7-73: LGTM: Environment definitions are well-structured.The environment-to-dependency mappings are clear and comprehensive, with appropriate version pins where needed.
harpy/snakefiles/metassembly.smk (2)
79-84: LGTM: Container and conda environment properly specified.The rule correctly specifies both the conda environment and container image, with the proper
metaspades.pycommand.
221-231: LGTM: Output organization improved.The updated params and shell command properly move assembly outputs into the athena directory for better organization.
harpy/common/workflow.py (3)
25-43: LGTM: Constructor signature updated for container-first design.The addition of the
containerparameter to__init__is a cleaner design that makes container usage explicit at workflow creation time. The change is consistently applied across all command modules in this PR.
61-86: LGTM: Method signature simplified by using instance attribute.The removal of the
containerparameter fromsetup_snakemakeand use ofself.containerinstead is consistent with the constructor changes and simplifies the method signature.
266-267: LGTM: Conditional conda recipe creation based on container flag.The logic correctly skips conda recipe creation when using containers, which aligns with the PR's goal of container-based deployment.
harpy/commands/qc.py (1)
49-50: LGTM: Workflow initialization updated to new API.The changes correctly adapt to the new
WorkflowAPI, passingcontainerto the constructor and removing it fromsetup_snakemake.harpy/commands/sv.py (2)
62-63: LGTM: Leviathan workflow initialization updated.The changes correctly implement the new
WorkflowAPI for the leviathan subcommand.
145-146: LGTM: NAIBR workflow initialization updated.The changes correctly implement the new
WorkflowAPI for the naibr subcommand.harpy/commands/metassembly.py (1)
42-44: LGTM: Metassembly workflow updated to new API and consolidated environments.The changes correctly implement the new
WorkflowAPI and remove the redundant "spades" environment in favor of the consolidated "assembly" environment.harpy/commands/assembly.py (1)
49-50: LGTM: Assembly workflow initialization updated to new API.The changes correctly adapt to the new
WorkflowAPI, maintaining consistency with other command modules.harpy/commands/demultiplex.py (1)
50-51: API alignment looks good.Constructor/setup update matches Workflow changes.
harpy/commands/deconvolve.py (1)
36-37: API alignment looks good.Constructor/setup update matches Workflow changes.
harpy/commands/snp.py (2)
59-60: API alignment looks good (freebayes).Matches new Workflow constructor and setup signatures.
132-133: API alignment looks good (mpileup).Matches new Workflow constructor and setup signatures.
harpy/commands/validate.py (2)
44-46: API alignment looks good (validate_bam).Constructor/setup update matches Workflow changes.
89-90: API alignment looks good (validate_fastq).Constructor/setup update matches Workflow changes.
harpy/commands/align.py (2)
54-55: LGTM! Clean API refactor.The container parameter is correctly moved from
setup_snakemake()to theWorkflowconstructor, aligning with the new API design.
125-126: LGTM! Consistent refactor.The strobe command follows the same pattern as bwa, correctly passing
containerto the constructor and removing it fromsetup_snakemake().harpy/commands/impute.py (1)
45-46: LGTM! Consistent with the refactor pattern.The impute workflow correctly adopts the new API, passing
containerto the constructor and removing it fromsetup_snakemake().harpy/commands/resume.py (3)
8-8: LGTM! Simplified imports.Removing
create_conda_recipesfrom the import is correct sinceWorkflow.initialize()now handles conda recipe creation internally.
17-29: LGTM! Cleaner API surface.Removing the
condaparameter and deriving it from the workflow configuration simplifies the interface and ensures consistency with the original workflow setup.
65-66: Clean mutation with clear intent.Using
pop()to both assign and removeinputsfrom the config is appropriate given the comment explaining the intent.
There was a problem hiding this comment.
Actionable comments posted: 4
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
harpy/commands/simulate_variants.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
harpy/commands/simulate_variants.py (2)
harpy/common/workflow.py (1)
setup_snakemake(61-108)harpy/commands/sv.py (1)
naibr(128-190)
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
harpy/snakefiles/sv_naibr_pop_phase.smk (1)
83-97: Add container or conda directives to rules using external tools.Several rules use external tools but lack container or conda directives. Since this PR removes the top-level container in favor of per-rule containers, these rules may fail if the required tools aren't available in the execution environment:
preprocess_reference(lines 83-97): requiresseqtkandsamtoolsindex_snps(lines 99-105): requiresbcftoolsindex_snps_gz(lines 107-113): requirestabixindex_alignments(lines 115-121): requiressamtoolsconcat_groups(lines 179-198): requiresconcatenate_bamandsamtoolsinfer_variants(lines 238-254): requiresinfer_svConsider adding container directives similar to the other rules. For example:
rule preprocess_reference: input: genomefile output: geno = workflow_geno, fai = f"{workflow_geno}.fai" log: f"{workflow_geno}.preprocess.log" + container: + "docker://pdimens/harpy:align_latest" shell: """ {{ seqtk seq {input} > {output.geno} samtools faidx --fai-idx {output.fai} {output.geno} }} 2> {log} """Apply similar patterns to the other rules, choosing the appropriate container based on the tools used (likely
align_latestorvariants_latest).Also applies to: 99-113, 179-198, 238-254
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
harpy/snakefiles/sv_naibr_phase.smk(5 hunks)harpy/snakefiles/sv_naibr_pop_phase.smk(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- harpy/snakefiles/sv_naibr_phase.smk
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: qc
- GitHub Check: freebayes
- GitHub Check: align strobe
- GitHub Check: simulate variants
- GitHub Check: align BWA
- GitHub Check: phase
- GitHub Check: mpileup
- GitHub Check: assembly
- GitHub Check: sv naibr
- GitHub Check: sv leviathan
- GitHub Check: impute
- GitHub Check: validate
🔇 Additional comments (1)
harpy/snakefiles/sv_naibr_pop_phase.smk (1)
115-121: Verify the intended behavior for index file location.The issue identified in the review comment is real: the
index_alignmentsrule declares output as"{sample}.bai"(working directory), butsamtools index {input}creates the index file next to the input BAM file. Sinceget_alignmentsreturns a full path frombamlist(line 73), there's a mismatch. Theget_align_indexfunction (line 81) correctly returnsaln[0] + ".bai"(full path), expecting the index next to the BAM file. This mismatch breaks Snakemake's dependency tracking, andphase_alignmentswon't correctly depend onindex_alignments.However, the proposed fix requires clarification:
- If indexes should be next to BAM files (recommended for portability): Fix the output declaration to return a path matching where samtools creates it, rather than changing the command.
- If indexes should be in the working directory: The command should be
samtools index {input} {output}, ANDget_align_indexmust also be updated to returnf"{wildcards.sample}.bai"instead of the full path.Confirm which approach aligns with the intended workflow architecture before applying the fix.
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
harpy/reports/align_bxstats.qmd(1 hunks)harpy/reports/align_stats.qmd(1 hunks)harpy/reports/bcftools_stats.qmd(1 hunks)harpy/reports/hapcut.qmd(1 hunks)harpy/reports/impute.qmd(1 hunks)harpy/reports/leviathan.qmd(1 hunks)harpy/reports/leviathan_pop.qmd(1 hunks)harpy/reports/naibr.qmd(1 hunks)harpy/reports/naibr_pop.qmd(1 hunks)harpy/reports/qc_bx_stats.qmd(1 hunks)harpy/reports/stitch_collate.qmd(1 hunks)harpy/reports/validate_bam.qmd(1 hunks)harpy/reports/validate_fastq.qmd(1 hunks)
🔇 Additional comments (5)
harpy/reports/stitch_collate.qmd (1)
17-22: LGTM! Consistent with containerization approach.Same pattern as other reports in this PR—replacing dynamic installation with explicit imports. The change is appropriate and maintains consistency across the codebase.
harpy/reports/align_bxstats.qmd (1)
8-13: LGTM! Explicit imports align with containerization goals.The refactoring is consistent with the broader PR strategy. The explicit
library()calls make dependencies clear and ensure failures occur early if packages are missing from the container.harpy/reports/leviathan.qmd (1)
19-24: LGTM! Consistent pattern across reports.The change maintains consistency with the containerization approach used throughout this PR. Explicit imports improve clarity and enable early failure detection.
harpy/reports/validate_bam.qmd (1)
9-14: LGTM! Completes consistent refactoring across all reports.This final report file follows the same pattern established across all reviewed files. The uniform approach to replacing dynamic installation with explicit imports is clean and well-executed.
harpy/reports/naibr.qmd (1)
42-47: LGTM! Explicit package imports are properly supported by environment configuration.The shift from dynamic package installation to explicit
library()calls is well-founded. All required packages (r-dt, r-dplyr, r-tidyr, r-biocircos) are defined in the environment configuration, so the fail-fast behavior will correctly surface any issues during environment setup rather than at runtime.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
harpy/snakefiles/sv_naibr_pop_phase.smk (1)
137-138: Container image availability concern (previously flagged).Per-rule container directives have been added across this workflow, referencing
docker://pdimens/harpy:phase_latest,docker://pdimens/harpy:variants_latest, anddocker://pdimens/harpy:report_latest. This aligns with the PR's objective of using smaller, specialized containers.However, a previous review noted that these images may not exist on Docker Hub. Ensure these container images are built and pushed before merging, or the workflow will fail at runtime when Snakemake attempts to pull them.
Run the following script to verify the container images are accessible:
#!/bin/bash # Description: Verify that the specified container images exist and are pullable echo "Checking container images..." for tag in phase_latest variants_latest report_latest; do echo "Checking pdimens/harpy:${tag}..." docker pull "pdimens/harpy:${tag}" --quiet && echo "✓ Found" || echo "✗ Not found" doneAlso applies to: 231-232, 315-316, 344-345
harpy/snakefiles/sv_naibr_phase.smk (1)
114-115: Container image availability concern (previously flagged).The per-rule container directives reference the same
pdimens/harpyimages as insv_naibr_pop_phase.smk. Please ensure these images are built and accessible before merging to avoid runtime failures.See verification script in the review comment for
harpy/snakefiles/sv_naibr_pop_phase.smkat lines 137-138.Also applies to: 177-178, 261-262
🧹 Nitpick comments (2)
harpy/common/printing.py (1)
119-134: Consider adding type hints for consistency.The new
print_shellcmd_simplefunction is well-implemented and integrates nicely with the existing printing utilities. However, it lacks type hints for thetextparameter and return value, unlike other functions in this module.Apply this diff to add type hints:
-def print_shellcmd_simple(text): +def print_shellcmd_simple(text: str) -> None:harpy/snakefiles/sv_naibr_phase.smk (1)
71-81: Consider reordering outputs for consistency.The
index_alignmentsrule lists the BAI file before the namedbamparameter in the outputs (lines 75-76). While functionally correct, it's more conventional to list named outputs first or keep them in logical order (BAM before BAI).Apply this diff for better readability:
output: - temp("workflow/input/bam/{sample}.bam.bai"), - bam = temp("workflow/input/bam/{sample}.bam") + bam = temp("workflow/input/bam/{sample}.bam"), + bai = temp("workflow/input/bam/{sample}.bam.bai")Then update the shell command to reference the named output:
shell: """ ln -sr {input} {output.bam} - samtools index {output.bam} + samtools index {output.bam} {output.bai} """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
harpy/commands/diagnose.py(2 hunks)harpy/common/launch.py(2 hunks)harpy/common/printing.py(2 hunks)harpy/common/workflow.py(10 hunks)harpy/snakefiles/sv_naibr_phase.smk(5 hunks)harpy/snakefiles/sv_naibr_pop_phase.smk(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
harpy/common/printing.py (1)
harpy/common/cli_types_params.py (1)
sanitize_shell(8-9)
harpy/common/workflow.py (5)
harpy/common/file_ops.py (3)
filepath(9-11)gzip_file(18-23)purge_empty_logs(25-32)harpy/commands/environments.py (4)
container(67-78)conda(32-64)containerize(11-51)container(100-111)harpy/common/printing.py (1)
print_error(15-55)harpy/common/launch.py (1)
launch_snakemake(95-307)harpy/commands/sv.py (1)
naibr(128-190)
harpy/snakefiles/sv_naibr_phase.smk (1)
harpy/commands/sv.py (1)
naibr(128-190)
harpy/snakefiles/sv_naibr_pop_phase.smk (1)
harpy/common/summaries.py (2)
naibr_phase_pop(730-771)phase(269-309)
harpy/commands/diagnose.py (2)
harpy/common/printing.py (2)
print_error(15-55)print_shellcmd_simple(119-134)harpy/common/file_ops.py (1)
safe_read(34-41)
🪛 Ruff (0.14.1)
harpy/commands/diagnose.py
106-106: Starting a process with a shell, possible injection detected
(S605)
159-159: Starting a process with a shell, possible injection detected
(S605)
163-163: Cannot use an escape sequence (backslash) in f-strings on Python 3.11 (syntax was added in Python 3.12)
(invalid-syntax)
167-167: Starting a process with a shell, possible injection detected
(S605)
170-170: Cannot use an escape sequence (backslash) in f-strings on Python 3.11 (syntax was added in Python 3.12)
(invalid-syntax)
176-176: Starting a process with a shell, possible injection detected
(S605)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: impute
- GitHub Check: mpileup
- GitHub Check: freebayes
- GitHub Check: phase
- GitHub Check: align BWA
- GitHub Check: align strobe
- GitHub Check: simulate variants
- GitHub Check: qc
- GitHub Check: sv naibr
- GitHub Check: deconvolve
- GitHub Check: sv leviathan
- GitHub Check: assembly
- GitHub Check: validate
🔇 Additional comments (5)
harpy/common/workflow.py (3)
25-46: LGTM! Clean API refactoring.The addition of the
containerparameter to theWorkflowconstructor and making it an instance attribute (rather than a parameter tosetup_snakemake) is a cleaner design that better reflects the container mode as a workflow-level configuration.
168-183: Improved error handling for missing snakefiles.The rewritten
fetch_snakefilemethod now provides clearer error messages and properly handles missing snakefile resources with explicit directory creation.
266-267: Code change is correct; conda recipe creation is properly gated in container mode.Verification confirms the conditional at lines 266–267 correctly skips conda recipe creation when
self.containeris True. The only other call site inenvironments.py:53is intentional, as it explicitly creates a non-container workflow where conda recipes should always be generated. No issues found.harpy/common/launch.py (1)
95-95: LGTM! API change is consistent with Workflow refactoring.The removal of the
workflowparameter fromlaunch_snakemakealigns with the changes inharpy/common/workflow.pywherecontaineris now an instance attribute rather than being passed through method parameters.harpy/snakefiles/sv_naibr_pop_phase.smk (1)
108-118: LGTM! Cleaner alignment indexing pattern.The updated
index_alignmentsrule now uses a symlink approach to avoid duplicating alignment data, and explicitly declares both the BAM and BAI as temporary outputs. This is more efficient and clearer than the previous implementation.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
resources/changelog.md (1)
1-24: Changelog is missing documentation of major architectural changes.The PR objectives describe significant infrastructure changes (replacing Snakemake containers with Pixi-driven multi-stage builds, updating the Workflow API, removing
fetch_snakefile, adjusting CI dependencies), but none of these are documented in the changelog. This creates a gap between the PR's scope and what users/developers will see when reading release notes.Add entries documenting:
- Container refactoring: single large container → multiple smaller environment-tagged containers with Pixi multistage builds
- Workflow API: updated to accept a container flag
- Removed:
fetch_snakefilefunction- Removed: singleton-separation script
- Updated: CI and report dependency handling
- Added: per-rule container directives across Snakefiles
Consider creating a separate subsection (e.g., "# infrastructure" or "# internal") to document these breaking/significant internal changes, or expand the "# changes" section to include them.
🧹 Nitpick comments (1)
resources/meta.yaml (1)
49-49: Loose version constraint oncondadependency.Line 49 specifies
conda >24.7without an upper bound. This is permissive and could introduce issues if a future conda version introduces breaking changes. Consider pinning an upper bound (e.g.,conda >=24.7,<25) to better protect against version drift.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
resources/changelog.md(1 hunks)resources/harpy.yaml(0 hunks)resources/meta.yaml(1 hunks)
💤 Files with no reviewable changes (1)
- resources/harpy.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (14)
- GitHub Check: sv naibr
- GitHub Check: impute
- GitHub Check: align BWA
- GitHub Check: qc
- GitHub Check: align strobe
- GitHub Check: sv leviathan
- GitHub Check: freebayes
- GitHub Check: phase
- GitHub Check: mpileup
- GitHub Check: simulate variants
- GitHub Check: deconvolve
- GitHub Check: assembly
- GitHub Check: validate
- GitHub Check: demux meier2021
🔇 Additional comments (1)
resources/meta.yaml (1)
52-53: Dependency versions are compatible, but entry point discrepancy needs clarification.Version upgrades are sound: rich-click 1.9.3 and snakemake-minimal 9.11 have minimal breaking changes and should work with the existing codebase.
However, the original concern about entry point removals is based on incomplete information. The four entry points in question—
deconvolve_alignments,extract_bxtags,separate_singletons, andseparate_validbx—are still defined inpyproject.toml(lines 30, 32, 40, 41) and therefore exist in the codebase. They are not present inmeta.yaml(lines 20-37).This indicates the conda recipe intentionally excludes these entry points from the conda package. Verify whether this omission is intentional (e.g., these scripts are not meant to be installed via conda) or if
meta.yamlshould be synced withpyproject.tomlto include all defined entry points.
This PR replaces the one big snakemake-created containerized container in favor of multiple smaller and env_tagged containers that have the software installed using multistaged pixi builds within them.
Summary by CodeRabbit
New Features
Chores
Removals
Behavior