This pipeline is a reproducible workflow for calculating and validating PRS for Type 2 Diabetes (T2D) across the UK Biobank, All of Us, and Lyday datasets. This pipeline leverages Python, R, and Nextflow to enable robust PRS analysis with ancestry adjustment and statistical validation.
| Folder | Description |
|---|---|
Data/ |
Describes each cohort: Lyday, All of Us, and UK Biobank. Includes info on preprocessing. |
Tools/ |
Contains usage notes for core tools like pgsc_calc, PRSice-2, PRScs, Beagle, and more. |
Ancestry Adjustment/ |
Focuses on ancestry-normalized PRS and includes the logic replicated from the pgsc_calc framework. |
Validation/ |
Scripts and results for statistical validation using R. |
PRS_Pipeline/ |
(Coming soon) Python-based CLI pipeline to modularly run each step with checkpointing. |
- UK Biobank: Used for testing and benchmarking PRS tools (data already imputed and standardized)
- All of Us: PRS calculation in addition to ancestry-adjustment using
pgsc_calc - Lyday: Raw VCFs requiring genotype harmonization, imputation, and conversion prior to PRS calculation
Each dataset has its own documentation inside the Data/ directory.
pgsc_calc: Nextflow-based official PRS scoring pipelinePRSice-2: Clumping + thresholding toolPRScs: Bayesian regression method for PRSBeagle 5.5: Imputationconform-gt: Genotype harmonization- Python for pipeline scripting
- R for statistical validation (e.g., AUC, Odds Ratio, Cohen’s d)
This project is under development. To reproduce or test:
- Navigate to the dataset of interest under
Data/ - Set up the tools required under
Tools/ - Run the PRS pipeline using
pgsc_calcor the custom Python workflow - Validate the scores using the scripts in
Validation/
- Lambert et al. (2021), The Polygenic Score Catalog as an open database for reproducibility and systematic evaluation.
- All of Us Research Program, NIH
Valli Sree Lasya Pasumarthy ([email protected])
Graduate Research – Jordan Lab
Masters of Science in Bioinformatics
School of Biological Sciences, Georgia Institute of Technology