-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
Based on research of common featureCounts use cases on GitHub, fcount-rs already supports the primary RNA-seq workflow well. This issue tracks three missing features to expand coverage:
| Feature | Priority | Use Case |
|---|---|---|
| SAF format | High | ATAC-seq/ChIP-seq peak counting |
Long-read mode (-L) |
Medium | Nanopore/PacBio transcriptomics |
Per-read details (-R) |
Low | Debugging/QC |
1. SAF Format Support
Why: SAF (Simplified Annotation Format) is essential for ATAC-seq and ChIP-seq workflows where users count reads in peaks rather than genes. This is the #3 most common use case on GitHub (after RNA-seq and paired-end counting).
SAF Format:
GeneID Chr Start End Strand
peak1 chr1 100 200 +
peak2 chr1 500 600 -
- Tab-separated, 5 columns
- Optional header row
- 1-based coordinates
Implementation:
- Auto-detect SAF format from file extension (
.saf) or content (5 columns, no feature type) - Add
src/annotation/saf.rsparser - Each SAF row becomes one Feature (peaks are standalone, no exon grouping)
2. Long-Read Mode (-L)
Why: Growing use of Nanopore/PacBio for transcriptomics (e.g., scNanoGPS for single-cell nanopore). The -L flag in featureCounts adjusts counting for long-read characteristics.
What -L does:
- Counts reads spanning multiple exons as single feature
- Adjusts overlap handling for reads that may span entire genes
- Handles the many CIGAR operations typical of long reads
Implementation:
- Add
-L/--long-readsflag to CLI - Adjust overlap resolution for long-read characteristics
- Consider minimum overlap as fraction of feature, not read
3. Per-Read Details Output (-R)
Why: Useful for debugging and understanding assignment decisions.
Implementation:
- Enable the existing
-R/--detailsplaceholder in CLI - Output format:
read_name status gene_id overlap_length - Add
src/output/details.rswriter
References
Common featureCounts usage patterns from GitHub:
- nf-core/atacseq - Uses featureCounts with SAF for peak counting
- CebolaLab/ATAC-seq - BED to SAF conversion for FRiP scores
- gaolabtools/scNanoGPS - featureCounts for nanopore transcriptomics