Skip to content

Support common featureCounts use cases: SAF format, long-read mode, per-read details #2

@jayhesselberth

Description

@jayhesselberth

Summary

Based on research of common featureCounts use cases on GitHub, fcount-rs already supports the primary RNA-seq workflow well. This issue tracks three missing features to expand coverage:

Feature Priority Use Case
SAF format High ATAC-seq/ChIP-seq peak counting
Long-read mode (-L) Medium Nanopore/PacBio transcriptomics
Per-read details (-R) Low Debugging/QC

1. SAF Format Support

Why: SAF (Simplified Annotation Format) is essential for ATAC-seq and ChIP-seq workflows where users count reads in peaks rather than genes. This is the #3 most common use case on GitHub (after RNA-seq and paired-end counting).

SAF Format:

GeneID  Chr  Start  End  Strand
peak1   chr1  100    200  +
peak2   chr1  500    600  -
  • Tab-separated, 5 columns
  • Optional header row
  • 1-based coordinates

Implementation:

  • Auto-detect SAF format from file extension (.saf) or content (5 columns, no feature type)
  • Add src/annotation/saf.rs parser
  • Each SAF row becomes one Feature (peaks are standalone, no exon grouping)

2. Long-Read Mode (-L)

Why: Growing use of Nanopore/PacBio for transcriptomics (e.g., scNanoGPS for single-cell nanopore). The -L flag in featureCounts adjusts counting for long-read characteristics.

What -L does:

  • Counts reads spanning multiple exons as single feature
  • Adjusts overlap handling for reads that may span entire genes
  • Handles the many CIGAR operations typical of long reads

Implementation:

  • Add -L/--long-reads flag to CLI
  • Adjust overlap resolution for long-read characteristics
  • Consider minimum overlap as fraction of feature, not read

3. Per-Read Details Output (-R)

Why: Useful for debugging and understanding assignment decisions.

Implementation:

  • Enable the existing -R/--details placeholder in CLI
  • Output format: read_name status gene_id overlap_length
  • Add src/output/details.rs writer

References

Common featureCounts usage patterns from GitHub:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions