poodleR helps estimate donor proportions in pooled sequencing experiments using:
- donor genotypes stored in a VCF
- pooled read counts exported by
bam-readcount
The package covers the full workflow:
- Convert donor genotypes to minor-allele dosages with
create_gt_dosages(). - Convert pooled read counts to minor-allele frequency estimates with
estimate_b_from_bam_readcount(). - Run the combined preprocessing step with
calculate_nonRef_vector_and_genotype_df(). - Estimate donor weights with
estimate_weights(). - Use
is_identical_genotype()andmod_lsqlincon()as lower-level utilities.
install.packages("remotes")
remotes::install_github(
"TrynkaLab/poodleR",
dependencies = TRUE,
build_vignettes = TRUE
)Before announcing the package publicly:
- Install the package dependencies locally.
- Run
devtools::check()orR CMD check. - Confirm the vignette builds and the example workflow runs.
- Push the repository to a public GitHub repo named
poodleR. - Update the
USERNAME/poodleRplaceholders in this README. - Create a release tag such as
v0.1.0.
This repository now includes a GitHub Actions workflow at .github/workflows/R-CMD-check.yaml so every push and pull request can run package checks automatically.
The repository includes compressed example inputs in inst/extdata:
bam_path <- system.file("extdata", "bam-readcount_example.tsv.gz", package = "poodleR")
vcf_path <- system.file("extdata", "vcf_example.vcf.gz", package = "poodleR")It also includes smaller .rda example objects in data/ for quick experimentation:
A: genotype dosage tableb: pooled allele-frequency table withb_estimateb_estimate_dt: filtered allele-frequency table aligned toAbam_readcount: imported bam-readcount examplevcf: donor genotype VCF as avcfRobject
library(poodleR)
bam_path <- system.file("extdata", "bam-readcount_example.tsv.gz", package = "poodleR")
vcf_path <- system.file("extdata", "vcf_example.vcf.gz", package = "poodleR")
prepared <- calculate_nonRef_vector_and_genotype_df(
bam_readcount_path = bam_path,
vcf_path = vcf_path,
min_read_overlap = 5
)
weights <- estimate_weights(
b = prepared$b_estimate_table$b_estimate,
A = prepared$ma_dosages
)
weights
sum(weights)Convert a vcfR object to a donor-by-SNP dosage table:
data(vcf, package = "poodleR")
dosages <- create_gt_dosages(vcf)
head(dosages[, 1:4])Each donor column is numeric with values 0, 0.5, or 1, and rn stores the SNP identifier (CHROM_POS_REF).
Convert imported bam-readcount rows to pooled allele frequencies after aligning them to VCF alleles:
data(bam_readcount, package = "poodleR")
data(vcf, package = "poodleR")
colnames(bam_readcount) <- c("CHROM", "POS", "REF", "TOTAL_READS", "DEL", "A", "C", "G", "T", "N")
bam_readcount$CHROM_POS_REF <- paste(bam_readcount$CHROM, bam_readcount$POS, bam_readcount$REF, sep = "_")
vcf_reduced <- data.table::as.data.table(vcf@fix[, 1:5])
vcf_reduced$CHROM_POS_REF <- paste(vcf_reduced$CHROM, vcf_reduced$POS, vcf_reduced$REF, sep = "_")
vcf_reduced <- vcf_reduced[vcf_reduced$CHROM_POS_REF %in% bam_readcount$CHROM_POS_REF, ]
bam_readcount <- bam_readcount[bam_readcount$CHROM_POS_REF %in% vcf_reduced$CHROM_POS_REF, ]
b_estimate_table <- estimate_b_from_bam_readcount(bam_readcount, vcf_reduced)
head(b_estimate_table)Run the full preprocessing pipeline directly from the shipped example files:
prepared <- calculate_nonRef_vector_and_genotype_df(
bam_readcount_path = bam_path,
vcf_path = vcf_path,
min_read_overlap = 5
)
names(prepared)
nrow(prepared$b_estimate_table)
nrow(prepared$ma_dosages)This returns two aligned tables ready for deconvolution:
b_estimate_table: pooled allele frequenciesma_dosages: donor genotype dosages
Estimate donor contributions from the aligned objects:
weights <- estimate_weights(
b = prepared$b_estimate_table$b_estimate,
A = prepared$ma_dosages
)
weights
sum(weights)The output is a named numeric vector constrained to sum to 1.
For a longer worked example, see the vignette:
vignette("how_to_use_poodleR", package = "poodleR")