This workflow is inspired by the DivRef repository which is used to generate a bundle of FASTA sequences and a corresponding DuckDB index of common human variation.
The original implementation is via a set of standalone Python scripts and a Makefile.
This implementation:
- Wraps the Python scripts in a toolkit with added typing, improved parameterization, and added unit testing.
- Adds a Snakemake workflow and associated configuration to drive the resource generation process.
The environment for this analysis is managed using pixi.
Follow the developer instructions to install pixi.
The environment and dependencies are automatically created and installed when calling pixi install for the first time.
To enable access to Hail tables via the GCS Connector, run pixi run setup-gcs.
You will need to log in to GCS before running any of the Hail-dependent tools.
gcloud auth application-default login