Skip to content

Add modules and subworkflows for single-cell DNA processing.#79

Open
ljwharbers wants to merge 92 commits intonf-core:devfrom
ljwharbers:scdnalong
Open

Add modules and subworkflows for single-cell DNA processing.#79
ljwharbers wants to merge 92 commits intonf-core:devfrom
ljwharbers:scdnalong

Conversation

@ljwharbers
Copy link
Copy Markdown
Contributor

@ljwharbers ljwharbers commented Aug 14, 2025

The following key changes have been made to enable both cDNA and DNA sample processing with the same pipeline.

Modules added:

  • The following modules have been added flexiplex/discovery, flexiplex/filter, flexiplex/assign and flexiformatter. The flexiplex modules are used to extract barcodes, filter and assign barcodes. flexiformatter is used to move the barcodes from the readname to the bam tags.

Subworkflow added:

  • The demultiplexing has been moved to two subworkflows. demultiplex_blaze.nf and demultiplex_flexiplex.nf where the demultiplexing steps are executed.
  • One subworkflow has been added: align_deduplicate_dna.nf which performs minimap2 alignment into picard MarkDuplicates and BAM_SORT_STATS_SAMTOOLS.

Input changes:

  • Samplesheet now accepts an additional column type which can be either dna or cdna. It is an optional column and defaults to cdna to maintain compatibility with older samplesheets.

Config changes:

  • The whitelist parameter is removed and replaced by two new parameters whitelist_dna and whitelist_cdna to specifiy a DNA and cDNA whitelist, respectively.
  • Included whitelists in assets/ are now .gz instead of .zip to easier compatibility with blaze/flexiplex with one simple gunzip module if the file ends with .gz.
  • New config parameter: demux_tool to specify whether to use flexiplex or blaze for cDNA samples demultiplexing.

QC changes:

  • The script used by READ_COUNTS() is edited to be able to use flexiplex and blaze output as input.

TODO:
Update readmes and ...?

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/scnanoseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

atrull314 and others added 30 commits October 7, 2024 09:21
Performing merge for first release
…ode_formats. Also change dlogic for whitelist gunzipping
@atrull314
Copy link
Copy Markdown
Collaborator

I know we talked in slack, but just to put it here as well.

One of the other items we'll need is test data for the DNA portions of the pipeline (and updates to the test.configs or perhaps even new test configs/profiles), with the downsampled and normal FASTQs going here: https://github.com/nf-core/test-datasets/tree/scnanoseq

@ljwharbers
Copy link
Copy Markdown
Contributor Author

I am sorry it turned into this beast of a PR... :')

I have tested it and done some last-minute refactoring of the outdirs. Also added some test data that runs through without a problem on my end. Feel free to have a look and let me know if there is anything needed on my end @atrull314

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

4 participants