A Nextflow DSL2 pipeline for processing Oxford Nanopore long-read sequencing data to generate haplotype-resolved variant calls and DNA methylation profiles.
- Basecalling from POD5 files using
Dorado - Read alignment using
minimap2 - Variant calling and haplotagging using
PEPPER-Margin-DeepVariant - Methylation calling using
modkit - Summary reporting using
MultiQC
- Nextflow version 22.10.0 or higher
- Singularity or Docker (or a compatible container runtime)
-
Prepare your sample sheet (CSV):
sampleid,data_dir sample1,/path/to/sample1/pod5/ sample2,/path/to/sample2/pod5/
Note: The pipeline expects the full path of the directory containing the
*.pod5or*.fast5files. Your directory structure should look like:/path/to/sample1/ └── pod5/ ├── file1.pod5 ├── file2.pod5 └── .../path/to/sample1/ └── fast5/ ├── file1.fast5 ├── file2.fast5 └── ... -
Target regions (BED)
chr1 1000000 2000000 chr2 3000000 4000000
-
Configure your reference and parameters in
nextflow.config:params { sample_sheet = "samplesheet.csv" basecall_model = "dna_r10.4.1_e8.2_400bps_sup@v5.2.0" basecall_modifications = "5mCG_5hmCG" reference = "GrCh38.fa" regions_bed = "regions.bed" outdir = "results" } -
Run the pipeline:
nextflow run main.nf -profile docker
Note
- The reference genome file must have a corresponding index (
.fai) file in the same directory. Generate it with:samtools faidx GrCh38.fa - For information on selecting basecall models and modification options, refer to the Dorado documentation.
- When
--regions_bedis provided:- Only reads overlapping the specified regions are extracted during haplotype splitting.
- Methylation calling and DMR detection are restricted to these regions.
The pipeline consists of the following main steps:
-
Basecalling (
ONT_BASECALL):- Performs basecalling and modified base detection using
Dorado. The output BAM file contains MM (modification type) and ML (modification likelihood) tags. - Automatically detects and converts FAST5 files to POD5 format if needed before basecalling.
- Output: Unaligned BAM with methylation tags.
- Performs basecalling and modified base detection using
-
Alignment (
ALIGNMENT):- Aligns reads to the reference genome with
minimap2. - Output: Aligned BAM, alignment statistics.
- Aligns reads to the reference genome with
-
Variant Calling and Phasing (
VARIANT_CALL):- Calls variants and haplotags reads using
PEPPER-Margin-DeepVariant. - Output: Haplotagged BAM and VCF.
- Calls variants and haplotags reads using
-
Haplotype Splitting (
SPLIT_BAM):- Splits haplotagged BAM into haplotype-specific files (HP1, HP2, and untagged reads).
- Optionally filters reads by genomic regions when
--regions_bedis provided. - Output: Three BAM files per sample (HP1, HP2, untagged) with their indices.
-
Haplotype-Resolved Methylation Calling (
METHYLATION_CALL):- Extracts and aggregates methylation calls separately for each haplotype using
modkit. - Generates both BED and bedGraph formats.
- Output: Methylation BED and bedGraph files for HP1 and HP2.
- Extracts and aggregates methylation calls separately for each haplotype using
-
Differentially Methylated Regions (DMR) (
DMR_CALL):- Identifies regions with significant methylation differences between haplotypes using
modkit dmr pair. - Output: DMR BED file and analysis log.
- Identifies regions with significant methylation differences between haplotypes using
-
Summary (
SUMMARY):- Generates aggregated QC report using
MultiQC. - Output: MultiQC report
- Generates aggregated QC report using
All outputs are organized per sample:
results/
sample1/
basecall/
sample1.raw.mod.bam
alignment/
sample1.aligned.sorted.bam
sample1.aligned.sorted.bam.bai
sample1.alignment.stats.txt
sample1.bam.mod.tags.txt
sample1.minimap2.log
variant_call/
sample1.aligned.sorted.haplotagged.bam
sample1.aligned.sorted.haplotagged.bam.bai
sample1.vcf
sample1.vcf.gz.tbi
sample1.visual_report.html
sample1.pepper.margin.deepvariant.log
logs/*.log
haplotypes/
sample1.HP1.bam
sample1.HP1.bam.bai
sample1.HP2.bam
sample1.HP2.bam.bai
sample1.untagged.bam
sample1.untagged.bam.bai
methylation_haplotypes/
sample1.HP1.methylation.calls.bed
sample1.HP1.modkit.pileup.bed.log
sample1.HP1.modkit.pileup.bedgraph.log
methylation_calls.HP1.bedGraph/*.bedgraph
sample1.HP2.methylation.calls.bed
sample1.HP2.modkit.pileup.bed.log
sample1.HP2.modkit.pileup.bedgraph.log
methylation_calls.HP2.bedGraph/*.bedgraph
differentially_methylated_regions/
sample1.haplotype.differentially.methylated.regions.bed
sample1.dmr.log
sample2/
...
report/
multiqc_report.html
multiqc_data/
Note
- For
*.methylation.calls.bedfile format and column descriptions, see modkit documentation. - For
*.differentially.methylated.regions.bedfile format and column descriptions, see modkit documentation.
- Adjust resource requirements and containers in
nextflow.configandconf/base.config. - Add or remove modules as needed for your workflow.
| Component | Version |
|---|---|
| Dorado | 0.9.6 |
| minimap2 | 2.30-r1287 |
| samtools | 1.13 |
| PEPPER-Margin-DeepVariant | r0.8 |
| modkit | 0.5.0 |
| MultiQC | 1.32 |
| bedtools | 2.30.0 |
| bcftools | 1.13 |
- PEPPER-Margin-DeepVariant:
kishwars/pepper_deepvariant:r0.8
todo