Nextflow pipeline for generating reference databases for amplicon sequences.
🚧 Caution, this package is still under development and has not been thoroughly tested. Workflow commands and modes of operation may change. 🚧
Install nextflow
conda create -n nextflow -c conda-forge -c bioconda -c defaults nextflow
conda activate nextflow
Clone the repo
git clone https://github.com/mikerobeson/nf-refdb-amplicon
Note: depending on the development cycle you may also need to clone and install the latest repo version of RESCRIPt into your QIIME 2 environment. You can comment the lines under Detected system environment and uncomment and modify the lines under Existing environment within the nextflow.config file.
conda activate qiime2-amplicon-2024.10
git clone https://github.com/bokulich-lab/RESCRIPt
cd RESCRIPt
pip install .
Run nextflow pipeline:
Then change to the nf-refdb-amplicon directory:
cd nf-refdb-amplicon
- Use
-profile localif running locally, or-profile clusterif running on HPC. - Then set either
ssuoressforparams.pipeline_typeparameter within thenextflow.configfile prior to running one of the pipelines outlined below.
nextflow run main.nf -profile <profile>
Note: the ess pipeline is currently in alpha development. You'll have to provide files using the params.segseqs, params.seqs, and params.taxa parameters in the config file.
If you make use of this pipeline please cite RESCRIPt:
- Michael S Robeson II, Devon R O'Rourke, Benjamin D Kaehler, Michal Ziemski, Matthew R Dillon, Jeffrey T Foster, Nicholas A Bokulich. (2021) RESCRIPt: Reproducible sequence taxonomy reference database management. PLoS Computational Biology 17 (11): e1009581. doi: 10.1371/journal.pcbi.1009581. GitHub.
Please be sure to cite the following as well:
- If using the SILVA data : Versions are released under different licenses. Refer to the current SILVA release license information for more details. How to cite SILVA.
- If using GTDB data : See the GTDB "about" page for more details. How to cite GTDB.
- If using RDP data : See the main RDP GitHub page and the RDP sourceforge page for more details. Please cite the following RDP aritcles: Wang et al. 2007 & Wang et al. 2024.
- If using NCBI Genbank data : See the NCBI disclaimer and copyright notice for more details. How to cite NCBI.
Note: an older Snakemake variant of this pipeline is available here.