Skip to content

refactor: replace Any annotations, use Path for local paths, add missing docstrings#28

Merged
ameynert merged 9 commits intomainfrom
am_code_quality
Apr 20, 2026
Merged

refactor: replace Any annotations, use Path for local paths, add missing docstrings#28
ameynert merged 9 commits intomainfrom
am_code_quality

Conversation

@ameynert
Copy link
Copy Markdown
Collaborator

@ameynert ameynert commented Apr 17, 2026

Summary

Addresses issues #20, #22, and #25 — all closely related code-quality improvements touching the same files.

Test plan

  • uv run --directory divref poe check-all passes (mypy and pytest; format/lint failures are pre-existing in the untracked extract_gnomad_single_afs.py file)
  • test_compute_haplotypes still passes with temp_dir as Path

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Add reference-genome parameter for FASTA and variation ratio workflows.
    • Added upfront path validation to catch missing/unwritable files early.
  • Refactor

    • Tools now accept standard pathlib.Path for filesystem arguments.
    • Removed obsolete credentials parameter from gnomAD sites VCF tool.
  • Documentation

    • Expanded function docstrings and clarified test descriptions.
  • Chores

    • Added runtime dependencies: fgmetric and fgpyo.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 17, 2026

📝 Walkthrough

Walkthrough

Standardizes type annotations (replacing Any/HailPath with hl.Expression/hl.Table and pathlib.Path), adds filesystem precondition checks, parameterizes reference genome use, updates result models to fgmetric.Metric, and adds runtime deps fgmetric and fgpyo.

Changes

Cohort / File(s) Summary
Type Annotation Refinement
divref/divref/haplotype.py
Tightened function signatures and nested helpers to use hail types (hl.Expression); removed unused Any import.
Docstrings & Tests (non-functional)
divref/divref/main.py, divref/tests/test_haplotype.py, divref/tests/tools/test_remap_divref.py, divref/tests/tools/test_rewrite_fasta.py
Expanded setup_logging docstring and added descriptive test docstrings; no runtime changes.
Path Type Migration & Validation — Core Tools
divref/divref/tools/compute_haplotypes.py, divref/divref/tools/compute_haplotype_statistics.py, divref/divref/tools/compute_variation_ratios.py, divref/divref/tools/create_fasta_and_index.py
Replaced HailPath with pathlib.Path; coerce Pathstr for Hail I/O; add assert_* filesystem checks; introduce reference_genome parameter where previously hardcoded.
Path Type Migration — Utility Tools
divref/divref/tools/create_gnomad_sites_vcf.py, divref/divref/tools/extract_gnomad_afs.py, divref/divref/tools/extract_sample_metadata.py, divref/divref/tools/gnomad_hail_table_test_data.py
Converted path parameters to Path, added precondition checks, removed gcs_credentials_path from create_gnomad_sites_vcf, and stringified paths for Hail calls.
Result Model & Output Refactoring
divref/divref/tools/compute_haplotype_statistics.py
Switched internal result models from pydantic.BaseModel to fgmetric.Metric; replaced in-memory accumulation + manual TSV writes with streaming MetricWriter usage.
Tests — Path handling & mocks
divref/tests/tools/test_compute_haplotypes.py, divref/tests/tools/test_create_gnomad_sites_vcf.py, divref/tests/tools/test_extract_gnomad_afs.py, divref/tests/tools/test_extract_sample_metadata.py
Updated tests to pass Path objects, adjusted mocks to target hl.init, and convert to str only when invoking Hail APIs.
Dependencies
divref/pyproject.toml
Added runtime dependencies fgmetric>=0.2.0 and fgpyo>=1.5.1.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 Paths now tidy, strings set free,

Hail expressions typed with glee,
Checkpoints guarded, genomes set right,
Metrics streaming through the night—
A rabbit hops, refactor complete! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately summarizes the three main changes: replacing Any type annotations with specific Hail types, changing HailPath to Path for local filesystem paths, and adding missing docstrings to functions.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch am_code_quality

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
divref/divref/main.py (1)

30-35: Add Returns: to complete the Google-style docstring block.

Nice improvement adding Args:. For full compliance, add a Returns: section (None) to setup_logging’s docstring.

Docstring completion diff
 def setup_logging(level: str = "INFO") -> None:
     """
     Set up basic logging to print to the console.

     Args:
         level: Logging level string (e.g. "INFO", "DEBUG", "WARNING").
+
+    Returns:
+        None.
     """

As per coding guidelines, "Use Google-style docstrings with Args:, Returns:, Yields:, and Raises: blocks."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@divref/divref/main.py` around lines 30 - 35, The docstring for setup_logging
is missing a Returns section; update the Google-style docstring of the
setup_logging function to include a "Returns:" block that documents it returns
None (e.g., "Returns: None: No return value."), keeping the existing Args: block
and overall style consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@divref/divref/alias.py`:
- Line 1: The HailExpression alias currently uses Any which disables type
checking; change the alias declaration (HailExpression: TypeAlias = Any) to
reference the concrete Hail expression type (e.g., HailExpression: TypeAlias =
hl.Expression or hail.expr.Expression), and ensure you import TypeAlias from
typing (or typing_extensions) and import hail as hl (or import
hail.expr.Expression) so the new type name resolves; update the alias definition
in the file (symbol: HailExpression, TypeAlias, Any) accordingly.

In `@divref/divref/main.py`:
- Line 13: The import of extract_gnomad_single_afs is failing because the module
divref.tools.extract_gnomad_single_afs is missing; either add a new module that
defines and exports the function extract_gnomad_single_afs (implement expected
CLI tool behavior) and then keep the registration in _tools, or remove the
extract_gnomad_single_afs import and its registration entry from the _tools list
in main.py until the new tool file is added so mypy/pytest can collect; locate
the symbol extract_gnomad_single_afs and the _tools registration in main.py to
apply the change.

---

Nitpick comments:
In `@divref/divref/main.py`:
- Around line 30-35: The docstring for setup_logging is missing a Returns
section; update the Google-style docstring of the setup_logging function to
include a "Returns:" block that documents it returns None (e.g., "Returns: None:
No return value."), keeping the existing Args: block and overall style
consistent.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9dd7665b-5fe4-476b-aa7c-f107c6fc8876

📥 Commits

Reviewing files that changed from the base of the PR and between 316c402 and ccb46c9.

📒 Files selected for processing (5)
  • divref/divref/alias.py
  • divref/divref/haplotype.py
  • divref/divref/main.py
  • divref/divref/tools/compute_haplotypes.py
  • divref/tests/tools/test_compute_haplotypes.py

Comment thread divref/divref/alias.py Outdated
Comment thread divref/divref/main.py Outdated
ameynert and others added 6 commits April 20, 2026 15:14
Issues #20, #22, #25:

- Add HailExpression type alias in alias.py to document Hail DSL
  expression parameters (instead of bare Any)
- Replace Any with hl.Table/HailExpression in haplotype.py and
  compute_haplotypes.py function signatures
- Add docstrings to nested helper functions in haplotype.py
  (get_chunk_until_next_variant, get_range) and compute_haplotypes.py
  (agg_haplos, collapse_haplos_across_samples, map_haplo_group,
  get_haplotype_summary, get_variant, get_gnomad_freq)
- Change temp_dir parameter from HailPath to Path in compute_haplotypes
  (it is always a local filesystem path passed to hl.init)
- Expand setup_logging docstring with Args section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@ameynert ameynert temporarily deployed to github-actions-snakemake-linting April 20, 2026 23:21 — with GitHub Actions Inactive
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
divref/divref/tools/create_fasta_and_index.py (1)

296-305: ⚠️ Potential issue | 🔴 Critical

Critical: Missing reference_genome argument in call to build_haplotype_table.

The mypy pipeline error confirms that reference_genome is required by build_haplotype_table (line 31) but is not passed in this call. This also explains the ARG001 warning—reference_genome is accepted as a parameter but never used.

Proposed fix
     ht, pops_legend = build_haplotype_table(
         haplotypes_table_path=haplotypes_table_path,
         gnomad_va_file=gnomad_va_file,
         reference_fasta=reference_fasta,
         window_size=window_size,
         frequency_cutoff=frequency_cutoff,
         merge=merge,
         version_str=version_str,
+        reference_genome=reference_genome,
         tmp_dir=tmp_dir,
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@divref/divref/tools/create_fasta_and_index.py` around lines 296 - 305, The
call to build_haplotype_table in create_fasta_and_index.py is missing the
required reference_genome argument; update the call to pass the appropriate
reference_genome variable (the same one used elsewhere in this module) so the
signature matches build_haplotype_table(reference_genome=...,...). Ensure you
add the named parameter reference_genome=reference_genome to the argument list
for build_haplotype_table to resolve the mypy ARG001 error and ensure the
parameter is used.
🧹 Nitpick comments (1)
divref/divref/tools/gnomad_hail_table_test_data.py (1)

50-50: Inconsistent: Missing str() conversion for Hail write calls.

Other tools in this PR consistently use str(path) when passing paths to Hail I/O methods. These lines pass Path objects directly, which is inconsistent with the pattern established elsewhere (e.g., extract_gnomad_afs.py line 64, extract_sample_metadata.py line 32).

♻️ Proposed fix for consistency
     logger.info(f"Writing {va_subset.count()} variants to {out_variant_annotation_table}.")
-    va_subset.write(out_variant_annotation_table, overwrite=True)
+    va_subset.write(str(out_variant_annotation_table), overwrite=True)

     sa = hl.read_table(in_gnomad_hgdp_sample_metadata)
     sa = sa.select("gnomad_population_inference").select_globals()

     logger.info(f"Writing {sa.count()} samples to {out_sample_metadata}.")
-    sa.naive_coalesce(1).write(out_sample_metadata, overwrite=True)
+    sa.naive_coalesce(1).write(str(out_sample_metadata), overwrite=True)

Also applies to: 56-56

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@divref/divref/tools/gnomad_hail_table_test_data.py` at line 50, The write
call uses a Path object directly; update the Hail I/O calls to pass string paths
by wrapping the Path variables with str()—specifically change the
va_subset.write call that takes out_variant_annotation_table to use
str(out_variant_annotation_table) (and likewise any other write call at the
referenced location, e.g., the second write at line 56) so the Hail write
invocations are consistent with other tools.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@divref/divref/tools/compute_haplotype_statistics.py`:
- Around line 6-7: The project is missing the external dependency used by
compute_haplotype_statistics.py: add the fgmetric package (which provides Metric
and MetricWriter used as imports and base classes in
compute_haplotype_statistics.py) to the project's dependency declarations (e.g.,
pyproject.toml [tool.poetry.dependencies] or setup.cfg/requirements.txt as
appropriate for this repo) so the imports "from fgmetric import Metric,
MetricWriter" resolve and CI installs fgmetric before running tests or
packaging.

In `@divref/divref/tools/compute_variation_ratios.py`:
- Line 39: The assertion assert_path_is_readable(vcfs_path) conflicts with the
docstring and hl.import_vcf's glob support: remove or replace it so glob
patterns are allowed; specifically, in compute_variation_ratios.py update the
validation for vcfs_path (instead of calling assert_path_is_readable(vcfs_path))
either remove that check entirely or implement a glob-aware check (e.g., use
glob.glob on vcfs_path and assert that it yields at least one match) before
calling hl.import_vcf to ensure both literal paths and glob patterns work as
intended.

In `@divref/divref/tools/create_fasta_and_index.py`:
- Line 294: hl.init is being passed a Path object (tmp_dir) but expects a string
path; update the call to hl.init(tmp_dir=...) to pass tmp_dir as a str by
converting the Path (e.g., use str(tmp_dir)) so hl.init receives a string.
Ensure you change the hl.init invocation that currently reads
hl.init(tmp_dir=tmp_dir) and leave other logic untouched.
- Around line 55-59: The call to
hl.get_reference(...).add_sequence(reference_fasta) passes a Path object;
convert reference_fasta to a string so Hail's add_sequence receives a file path
string. Update the usage where reference_fasta is passed (the add_sequence call
in this scope) to call str(reference_fasta) or otherwise ensure it's a str; keep
the hl.get_reference(...) and add_sequence(...) calls unchanged except for
converting the argument to a string.
- Around line 3-18: The import block is unsorted: move "from divref import
defaults" into the local/project import group with "from divref.haplotype import
get_haplo_sequence" and "from divref.haplotype import split_haplotypes" (and
then alphabetize within that group) so all divref imports are grouped together;
ensure standard-library, third-party (duckdb, hail, polars, fgpyo), and local
imports are each in their own sections and run isort or follow the existing
project import ordering conventions to pass CI.

---

Outside diff comments:
In `@divref/divref/tools/create_fasta_and_index.py`:
- Around line 296-305: The call to build_haplotype_table in
create_fasta_and_index.py is missing the required reference_genome argument;
update the call to pass the appropriate reference_genome variable (the same one
used elsewhere in this module) so the signature matches
build_haplotype_table(reference_genome=...,...). Ensure you add the named
parameter reference_genome=reference_genome to the argument list for
build_haplotype_table to resolve the mypy ARG001 error and ensure the parameter
is used.

---

Nitpick comments:
In `@divref/divref/tools/gnomad_hail_table_test_data.py`:
- Line 50: The write call uses a Path object directly; update the Hail I/O calls
to pass string paths by wrapping the Path variables with str()—specifically
change the va_subset.write call that takes out_variant_annotation_table to use
str(out_variant_annotation_table) (and likewise any other write call at the
referenced location, e.g., the second write at line 56) so the Hail write
invocations are consistent with other tools.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3f9c8dbc-caa0-4527-973e-ff1ed98fff88

📥 Commits

Reviewing files that changed from the base of the PR and between ccb46c9 and 662a331.

⛔ Files ignored due to path filters (2)
  • divref/uv.lock is excluded by !**/*.lock
  • pixi.lock is excluded by !**/*.lock
📒 Files selected for processing (18)
  • divref/divref/haplotype.py
  • divref/divref/main.py
  • divref/divref/tools/compute_haplotype_statistics.py
  • divref/divref/tools/compute_haplotypes.py
  • divref/divref/tools/compute_variation_ratios.py
  • divref/divref/tools/create_fasta_and_index.py
  • divref/divref/tools/create_gnomad_sites_vcf.py
  • divref/divref/tools/extract_gnomad_afs.py
  • divref/divref/tools/extract_sample_metadata.py
  • divref/divref/tools/gnomad_hail_table_test_data.py
  • divref/pyproject.toml
  • divref/tests/test_haplotype.py
  • divref/tests/tools/test_compute_haplotypes.py
  • divref/tests/tools/test_create_gnomad_sites_vcf.py
  • divref/tests/tools/test_extract_gnomad_afs.py
  • divref/tests/tools/test_extract_sample_metadata.py
  • divref/tests/tools/test_remap_divref.py
  • divref/tests/tools/test_rewrite_fasta.py
✅ Files skipped from review due to trivial changes (6)
  • divref/divref/main.py
  • divref/pyproject.toml
  • divref/tests/tools/test_extract_sample_metadata.py
  • divref/tests/tools/test_rewrite_fasta.py
  • divref/tests/tools/test_remap_divref.py
  • divref/tests/test_haplotype.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • divref/tests/tools/test_compute_haplotypes.py
  • divref/divref/haplotype.py
  • divref/divref/tools/compute_haplotypes.py

Comment thread divref/divref/tools/compute_haplotype_statistics.py
Comment thread divref/divref/tools/compute_variation_ratios.py
Comment thread divref/divref/tools/create_fasta_and_index.py Outdated
Comment thread divref/divref/tools/create_fasta_and_index.py Outdated
Comment thread divref/divref/tools/create_fasta_and_index.py Outdated
@ameynert ameynert temporarily deployed to github-actions-snakemake-linting April 20, 2026 23:33 — with GitHub Actions Inactive
@ameynert ameynert temporarily deployed to github-actions-snakemake-linting April 20, 2026 23:34 — with GitHub Actions Inactive
@ameynert ameynert temporarily deployed to github-actions-snakemake-linting April 20, 2026 23:36 — with GitHub Actions Inactive
@ameynert ameynert linked an issue Apr 20, 2026 that may be closed by this pull request
@ameynert ameynert merged commit bc0790e into main Apr 20, 2026
3 checks passed
@ameynert ameynert deleted the am_code_quality branch April 20, 2026 23:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant