Skip to content

Commit b0c0f23

Browse files
authored
Merge pull request #506 from SkyLexS/gecco_convert_dis
Gecco convert
2 parents fcfbc3a + de6c87a commit b0c0f23

23 files changed

+569
-72
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
- [#500](https://github.com/nf-core/funcscan/pull/500) Updated pipeline template to nf-core/tools version 3.4.1 (by @jfy133)
1111
- [#508](https://github.com/nf-core/funcscan/pull/508) Added support for antiSMASH's --clusterhmmer, --fullhmmer, and --tigrfam options (❤️ to @yusukepockyby for requesting, @jfy133)
12+
- [#506](https://github.com/nf-core/funcscan/pull/506) Added support GECCO convert for generation of additional files useful for downstream analysis (by @SkyLexS)
1213

1314
### `Fixed`
1415

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ nf-core/funcscan was originally written by Jasmin Frangenberg, Anan Ibrahim, Lou
9292

9393
We thank the following people for their extensive assistance in the development of this pipeline:
9494

95-
Adam Talbot, Alexandru Mizeranschi, Hugo Tavares, Júlia Mir Pedrol, Martin Klapper, Mehrdad Jaberi, Robert Syme, Rosa Herbst, Vedanth Ramji, @Microbion.
95+
Adam Talbot, Alexandru Mizeranschi, Hugo Tavares, Júlia Mir Pedrol, Martin Klapper, Mehrdad Jaberi, Robert Syme, Rosa Herbst, Vedanth Ramji, @Microbion, Dediu Octavian-Codrin.
9696

9797
## Contributions and Support
9898

conf/modules.config

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -532,6 +532,14 @@ process {
532532
].join(' ').trim()
533533
}
534534

535+
withName: GECCO_CONVERT {
536+
publishDir = [
537+
path: { "${params.outdir}/bgc/gecco/${meta.id}" },
538+
mode: params.publish_dir_mode,
539+
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
540+
]
541+
}
542+
535543
withName: HAMRONIZATION_ABRICATE {
536544
publishDir = [
537545
path: { "${params.outdir}/arg/hamronization/abricate" },

conf/test_bgc_bakta.config

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ params {
2323
config_profile_description = 'Minimal test dataset to check BGC workflow function'
2424

2525
// Input data
26-
input = params.pipelines_testdata_base_path + 'funcscan/samplesheet_reduced.csv'
26+
input = params.pipelines_testdata_base_path + 'funcscan/samplesheet_hits.csv'
2727
bgc_antismash_db = params.pipelines_testdata_base_path + 'funcscan/databases/antismash_trimmed_8_0_1.tar.gz'
2828

2929
annotation_tool = 'bakta'
@@ -33,6 +33,10 @@ params {
3333
run_amp_screening = false
3434
run_bgc_screening = true
3535

36+
bgc_gecco_runconvert = true
37+
bgc_gecco_convertmode = 'gbk'
38+
bgc_gecco_convertformat = 'bigslice'
39+
3640
bgc_run_hmmsearch = true
3741
bgc_hmmsearch_models = 'https://raw.githubusercontent.com/antismash/antismash/fd61de057e082fbf071732ac64b8b2e8883de32f/antismash/detection/hmm_detection/data/ToyB.hmm'
3842
}

conf/test_bgc_prokka.config

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ params {
2323
config_profile_description = 'Minimal test dataset to check BGC workflow function'
2424

2525
// Input data
26-
input = params.pipelines_testdata_base_path + 'funcscan/samplesheet_reduced.csv'
26+
input = params.pipelines_testdata_base_path + 'funcscan/samplesheet_hits.csv'
2727
bgc_antismash_db = params.pipelines_testdata_base_path + 'funcscan/databases/antismash_trimmed_8_0_1.tar.gz'
2828

2929
annotation_tool = 'prokka'
@@ -32,6 +32,10 @@ params {
3232
run_amp_screening = false
3333
run_bgc_screening = true
3434

35+
bgc_gecco_runconvert = true
36+
bgc_gecco_convertmode = 'gbk'
37+
bgc_gecco_convertformat = 'fna'
38+
3539
bgc_run_hmmsearch = true
3640
bgc_hmmsearch_models = 'https://raw.githubusercontent.com/antismash/antismash/fd61de057e082fbf071732ac64b8b2e8883de32f/antismash/detection/hmm_detection/data/ToyB.hmm'
3741
}

conf/test_bgc_pyrodigal.config

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ params {
2323
config_profile_description = 'Minimal test dataset to check BGC workflow function'
2424

2525
// Input data
26-
input = params.pipelines_testdata_base_path + 'funcscan/samplesheet_reduced.csv'
26+
input = params.pipelines_testdata_base_path + 'funcscan/samplesheet_hits.csv'
2727
bgc_antismash_db = params.pipelines_testdata_base_path + 'funcscan/databases/antismash_trimmed_8_0_1.tar.gz'
2828

2929
annotation_tool = 'pyrodigal'
@@ -32,6 +32,10 @@ params {
3232
run_amp_screening = false
3333
run_bgc_screening = true
3434

35+
bgc_gecco_runconvert = true
36+
bgc_gecco_convertmode = 'clusters'
37+
bgc_gecco_convertformat = 'gff'
38+
3539
bgc_run_hmmsearch = true
3640
bgc_hmmsearch_models = 'https://raw.githubusercontent.com/antismash/antismash/fd61de057e082fbf071732ac64b8b2e8883de32f/antismash/detection/hmm_detection/data/ToyB.hmm'
3741
}

conf/test_preannotated_bgc.config

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,8 @@ params {
3232
run_amp_screening = false
3333
run_bgc_screening = true
3434

35+
bgc_gecco_runconvert = true
36+
3537
bgc_run_hmmsearch = true
3638
bgc_hmmsearch_models = 'https://raw.githubusercontent.com/antismash/antismash/fd61de057e082fbf071732ac64b8b2e8883de32f/antismash/detection/hmm_detection/data/ToyB.hmm'
3739
}

docs/output.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -457,15 +457,21 @@ Note that filtered FASTA is only used for BGC workflow for run-time optimisation
457457
<summary>Output files</summary>
458458

459459
- `gecco/`
460+
- **GECCO**
460461
- `*.genes.tsv/`: TSV file containing detected/predicted genes with BGC probability scores
461462
- `*.features.tsv`: TSV file containing identified domains
462463
- `*.clusters.tsv`: TSV file containing coordinates of predicted clusters and BGC types
463464
- `*_cluster_*.gbk`: GenBank file (if clusters were found) containing sequence with annotations; one file per GECCO hit
464-
465-
</details>
465+
- `*.gff`: GFF3 converted cluster tables containing the position and metadata for all the predicted clusters (only if `--bgc_gecco_runconvert --bgc_gecco_convertmode clusters --bgc_gecco_convertformat gff`)
466+
- `*.region*.gbk`: Converted and aliased GenBank files so that they can be loaded by BiG-SLiCE (only if `--bgc_gecco_runconvert --bgc_gecco_convertmode gbk --bgc_gecco_convertformat bigslice`)
467+
- `*.faa`: Amino-acid FASTA converted GenBank files of all the proteins in a cluster (only if `--bgc_gecco_runconvert --bgc_gecco_convertmode gbk --bgc_gecco_convertformat faa`)
468+
- `*.fna`:Nucleotide sequence FASTA converted GenBank files from the cluster (only if `--bgc_gecco_runconvert --bgc_gecco_convertmode gbk --bgc_gecco_convertformat fna`)
469+
</details>
466470

467471
[GECCO](https://gecco.embl.de) is a fast and scalable method for identifying putative novel Biosynthetic Gene Clusters (BGCs) in genomic and metagenomic data using Conditional Random Fields (CRFs).
468472

473+
The additional GFF3, GenBank, or FASTA files from `--bgc_gecco_runconvert`, can be useful for additional further analysis of the BGC hits.
474+
469475
### Summary tools
470476

471477
[AMPcombi](#ampcombi), [hAMRonization](#hamronization), [comBGC](#combgc), [MultiQC](#multiqc), [pipeline information](#pipeline-information), [argNorm](#argnorm).

modules/nf-core/gecco/convert/environment.yml

Lines changed: 7 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

modules/nf-core/gecco/convert/main.nf

Lines changed: 56 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)