Making free science for everybody around the world 🌍
Quick Start • Features • Installation • Architecture • Documentation • Pipelines • Citation
RAPTOR is a comprehensive framework for RNA-seq analysis that makes sophisticated differential expression workflows accessible to everyone. Stop wondering which pipeline to use or what thresholds to set—RAPTOR provides ML-powered recommendations and ensemble methods for robust, reproducible results.
| Challenge | RAPTOR Solution |
|---|---|
| Which pipeline should I use? | ✅ ML recommendations based on 32 dataset features |
| Which DE method (DESeq2/edgeR/limma)? | ✅ Ensemble analysis combines all methods |
| What thresholds should I use? | ✅ 4 optimization methods for data-driven cutoffs |
| Is my data quality good enough? | ✅ 6 outlier detection methods with consensus |
| How do I know results are reliable? | ✅ Ensemble consensus with direction checking |
| What if methods disagree? | ✅ Brown's method accounts for correlation |
|
|
- Web-based interface (no coding!)
- Real-time visualizations
- Drag-and-drop data upload
- One-click ensemble analysis
- Export publication-ready reports
# Install
pip install raptor-rnaseq
# Launch dashboard
streamlit run raptor/dashboard/app.py
# Opens at http://localhost:8501
# Upload data → Profile → Get recommendation → Run ensemble → Done!# 1. Quality check
raptor qc --counts counts.csv --metadata metadata.csv
# 2. Profile your data
raptor profile --counts counts.csv --metadata metadata.csv --group-column condition
# 3. Get ML recommendation
raptor recommend --profile profile.json --method ml
# 4. Import DE results from different methods
raptor import-de --input deseq2.csv --method deseq2
raptor import-de --input edger.csv --method edger
raptor import-de --input limma.csv --method limma
# 5. Optimize thresholds (NEW!)
raptor optimize --de-result de_results.csv --method fdr-control --fdr-target 0.05
# 6. Ensemble analysis - combine all methods (NEW!)
raptor ensemble-compare --deseq2 deseq2.csv --edger edger.csv --limma limma.csvfrom raptor import (
quick_quality_check,
profile_data_quick,
recommend_pipeline,
optimize_with_fdr_control,
ensemble_brown
)
# 1. Quality check
qc_report = quick_quality_check('counts.csv', 'metadata.csv')
print(f"Outliers: {qc_report.outliers}")
# 2. Profile data (32 features extracted)
profile = profile_data_quick('counts.csv', 'metadata.csv', group_column='condition')
print(f"BCV: {profile.bcv:.3f} ({profile.bcv_category})")
# 3. Get ML recommendation
recommendation = recommend_pipeline(profile_file='profile.json', method='ml')
print(f"Recommended: {recommendation.pipeline_name} (confidence: {recommendation.confidence:.2f})")
# 4. After running DE analysis, optimize thresholds (NEW!)
result = optimize_with_fdr_control(de_result, fdr_target=0.05)
print(f"Optimal thresholds: {result.optimal_threshold}")
# 5. Ensemble analysis - combine DESeq2, edgeR, limma (NEW!)
consensus = ensemble_brown({
'deseq2': deseq2_result,
'edger': edger_result,
'limma': limma_result
})
print(f"Consensus DE genes: {len(consensus.consensus_genes)}")- Python: 3.8 - 3.12
- R: 4.0+ (optional, for Module 6 DE analysis)
- RAM: 4GB minimum (16GB recommended for pipelines)
- Disk: 500MB (Python package) / 5-8GB (with bioinformatics tools)
# Basic installation
pip install raptor-rnaseq
# With dashboard support
pip install raptor-rnaseq[dashboard]
# With all features
pip install raptor-rnaseq[all]
# Development installation
pip install raptor-rnaseq[dev]Core environment (Python only, ~500MB, 5-10 min):
conda env create -f environment.yml
conda activate raptorFull environment (with STAR, Salmon, Kallisto, R, ~5-8GB, 30-60 min):
conda env create -f environment-full.yml
conda activate raptor-fullSee docs/CONDA_ENVIRONMENTS.md for detailed comparison.
# Clone repository
git clone https://github.com/AyehBlk/RAPTOR.git
cd RAPTOR
# Install in editable mode
pip install -e .
# Or with development tools
pip install -e .[dev]
# Verify installation
raptor --version
pytest tests/RAPTOR is organized into 9 modules spanning 4 analysis stages:
┌─────────────────────────────────────────────────────────────┐
│ RAPTOR v2.2.0 │
│ RNA-seq Analysis Pipeline Framework │
└─────────────────────────────────────────────────────────────┘
Stage 1: Data Preparation & QC
├── Module 1: Quick Quantification (Salmon/Kallisto)
├── Module 2: Quality Assessment (6 outlier methods)
└── Module 3: Data Profiling (32 features)
Stage 2: Pipeline Selection
├── Module 4: ML Recommender (Random Forest)
└── Module 5: Production Pipelines (6 methods)
├── Salmon ⭐ (recommended)
├── Kallisto (fastest)
├── STAR + featureCounts
├── STAR + RSEM
├── STAR + Salmon (unique: BAM + bootstraps)
└── HISAT2 + featureCounts
Stage 3: Differential Expression
├── Module 6: DE Analysis (R: DESeq2, edgeR, limma)
└── Module 7: DE Import (standardize any format)
Stage 4: Advanced Analysis ⭐ NEW in v2.2.0
├── Module 8: Parameter Optimization (4 methods)
│ ├── Ground Truth Optimization
│ ├── FDR Control Optimization
│ ├── Stability Optimization
│ └── Reproducibility Optimization
└── Module 9: Ensemble Analysis (5 methods)
├── Fisher's Method
├── Brown's Method
├── Robust Rank Aggregation
├── Voting Consensus
└── Weighted Ensemble
RAPTOR supports 6 production RNA-seq quantification pipelines:
| Pipeline | Memory | Time | Produces | Best For | Recommended |
|---|---|---|---|---|---|
| salmon | 8 GB | 10-20 min | genes + isoforms + bootstraps | Standard DE analysis | ⭐ YES |
| kallisto | 4 GB | 5-10 min | genes + isoforms + bootstraps | Speed priority | ✓ |
| star_featurecounts | 32 GB | 40-70 min | BAM + genes | Gene-level publication | ✓ |
| star_rsem | 32 GB | 60-120 min | BAM + genes + isoforms | Isoform analysis | ✓ |
| star_salmon | 32 GB | 50-90 min | BAM + genes + isoforms + bootstraps | Unique: BAM + bootstraps | ✓ |
| hisat2_featurecounts | 16 GB | 30-60 min | BAM + genes | Low memory systems | ✓ |
⭐ Salmon is recommended for most use cases due to optimal speed/accuracy balance and bootstrap support.
All pipelines support:
- ✅ Paired-end and single-end reads
- ✅ Automatic parameter optimization
- ✅ QC report generation
- ✅ Multi-threading
- ✅ Sample sheet-based workflows
Pipeline selection:
# List available pipelines
raptor pipeline list
# Get detailed info
raptor pipeline run --name salmon --help
# Run with ML recommendation
raptor recommend --profile profile.json --method ml
# Recommended: salmon (confidence: 0.89)
raptor pipeline run --name salmon --samples samples.csv --index salmon_index/RAPTOR/
├── raptor/ # Core Python package
│ ├── __init__.py # Package initialization (v2.2.0)
│ ├── cli.py # Command-line interface (11 commands)
│ ├── quality_assessment.py # Module 2: QC (6 methods)
│ ├── profiler.py # Module 3: Profiling (32 features)
│ ├── recommender.py # Module 4: Rule-based
│ ├── ml_recommender.py # Module 4: ML-based
│ ├── de_import.py # Module 7: DE import
│ ├── parameter_optimization.py # Module 8: Optimization ⭐ NEW
│ ├── ensemble.py # Module 9: Ensemble ⭐ NEW
│ ├── simulation.py # Simulation tools
│ │
│ ├── pipelines/ # Module 5: Production pipelines
│ │ ├── base.py
│ │ ├── salmon/
│ │ ├── kallisto/
│ │ ├── star_featurecounts/
│ │ ├── star_rsem/
│ │ ├── star_salmon/
│ │ └── hisat2_featurecounts/
│ │
│ ├── external_modules/ # Module 6: R integration
│ │ └── module6_de_analysis/
│ │ └── r_scripts/ # DESeq2, edgeR, limma
│ │
│ ├── dashboard/ # Interactive Streamlit app
│ │ ├── app.py
│ │ ├── pages/ # 9 dashboard pages
│ │ ├── components/
│ │ └── utils/
│ │
│ └── utils/ # Utilities
│ ├── validation.py
│ ├── errors.py
│ └── sample_sheet.py
│
├── docs/ # Documentation
│ ├── MODULE_1_Quick_Quantification.md
│ ├── MODULE_2_Quality_Assessment.md
│ ├── MODULE_3_Data_Profiling.md
│ ├── MODULE_3_QUICK_REFERENCE.md
│ ├── MODULE_4_Pipeline_Recommender.md
│ ├── MODULE_7_DE_Import.md
│ ├── MODULE_8_Parameter_Optimization.md ⭐ NEW
│ ├── MODULE_9_Ensemble_Analysis.md ⭐ NEW
│ ├── CONDA_ENVIRONMENTS.md
│ ├── RAPTOR_QUICK_REFERENCE.md # Cheat sheet
│ └── RAPTOR_API_DOCUMENTATION.md # Python API
│
├── examples/ # Example scripts
│ ├── 02_quality_assessment.py
│ ├── 03_data_profiler.py
│ ├── 04_recommender.py
│ ├── 07_DE_Import.py
│ ├── 08_Parameter_Optimization.py ⭐ NEW
│ └── 09_Ensemble_Analysis.py ⭐ NEW
│
├── tests/ # Test suite (85%+ coverage)
│ ├── test_profiler.py
│ ├── test_quality_assessment.py
│ ├── test_parameter_optimization.py ⭐ NEW
│ ├── test_ensemble.py ⭐ NEW
│ └── ...
│
├── templates/ # Sample sheets
│ ├── sample_sheet_paired.csv
│ └── sample_sheet_single.csv
│
├── .github/ # GitHub templates
│ └── ISSUE_TEMPLATE/
│
├── setup.py # Package setup
├── requirements.txt # Python dependencies
├── environment.yml # Conda environment (core)
├── environment-full.yml # Conda environment (complete)
├── CITATION.cff # Citation metadata
├── CHANGELOG.md # Version history
├── CONTRIBUTING.md # Contribution guidelines
└── LICENSE # MIT License
| Document | Description |
|---|---|
| Quick Start | 5-minute quick start guide |
| Installation | Detailed installation instructions |
| CONDA_ENVIRONMENTS.md | Conda setup (core vs full) |
| Document | Description |
|---|---|
| MODULE_2_Quality_Assessment.md | QC with 6 outlier methods |
| MODULE_3_Data_Profiling.md | 32-feature profiling |
| MODULE_3_QUICK_REFERENCE.md | Profiling cheat sheet |
| MODULE_4_Pipeline_Recommender.md | ML recommendations |
| MODULE_7_DE_Import.md | Import & standardize DE results |
| MODULE_8_Parameter_Optimization.md | ⭐ 4 optimization methods |
| MODULE_9_Ensemble_Analysis.md | ⭐ 5 ensemble methods |
| Document | Description |
|---|---|
| RAPTOR_QUICK_REFERENCE.md | Command cheat sheet |
| RAPTOR_API_DOCUMENTATION.md | Complete Python API |
| examples/ | Example scripts for all modules |
| CHANGELOG.md | Version history |
from raptor import (
quick_quality_check,
profile_data_quick,
recommend_pipeline,
import_deseq2,
import_edger,
import_limma,
optimize_with_fdr_control,
ensemble_brown
)
# 1. Quality Check
print("Step 1: Quality Assessment...")
qc_report = quick_quality_check('counts.csv', 'metadata.csv')
if len(qc_report.outliers) > 0:
print(f"⚠️ Warning: {len(qc_report.outliers)} outliers detected")
else:
print("✅ No outliers detected")
# 2. Profile Data (32 features)
print("\nStep 2: Data Profiling...")
profile = profile_data_quick('counts.csv', 'metadata.csv', group_column='condition')
print(f" BCV: {profile.bcv:.3f} ({profile.bcv_category})")
print(f" Sample size: {profile.n_samples}")
# 3. Get ML Recommendation
print("\nStep 3: ML Recommendation...")
rec = recommend_pipeline(profile_file='results/profile/data_profile.json', method='ml')
print(f" Recommended: {rec.pipeline_name} (confidence: {rec.confidence:.2f})")
# 4. [Run recommended pipeline, then DE analysis in R]
# 5. Import DE Results
print("\nStep 4: Import DE Results...")
deseq2 = import_deseq2('deseq2_results.csv')
edger = import_edger('edger_results.csv')
limma = import_limma('limma_results.csv')
# 6. Optimize Thresholds (NEW!)
print("\nStep 5: Optimize Thresholds...")
opt_result = optimize_with_fdr_control(deseq2, fdr_target=0.05)
print(f" Optimal FDR: {opt_result.optimal_threshold['padj']:.3f}")
print(f" Optimal |logFC|: {opt_result.optimal_threshold['lfc']:.3f}")
# 7. Ensemble Analysis (NEW!)
print("\nStep 6: Ensemble Analysis (Brown's Method)...")
consensus = ensemble_brown({
'deseq2': deseq2,
'edger': edger,
'limma': limma
})
print(f" Consensus genes: {len(consensus.consensus_genes)}")
print(f" Direction consistency: {consensus.direction_consistency.mean():.1%}")
# 8. Export Results
consensus.to_csv('consensus_genes.csv')
print("\n✅ Analysis complete!")from raptor import import_de_result, ensemble_fisher, ensemble_brown, ensemble_rra
# Import results from different tools
deseq2 = import_de_result('deseq2_results.csv', method='deseq2')
edger = import_de_result('edger_results.csv', method='edger')
limma = import_de_result('limma_results.csv', method='limma')
# Try multiple ensemble methods
results = {}
# Fisher's Method (classic)
results['fisher'] = ensemble_fisher({'deseq2': deseq2, 'edger': edger, 'limma': limma})
# Brown's Method (recommended - accounts for correlation)
results['brown'] = ensemble_brown({'deseq2': deseq2, 'edger': edger, 'limma': limma})
# Robust Rank Aggregation
results['rra'] = ensemble_rra({'deseq2': deseq2, 'edger': edger, 'limma': limma})
# Compare results
for method, result in results.items():
print(f"{method}: {len(result.consensus_genes)} consensus genes")
# Use Brown's method (best for correlated methods)
final_result = results['brown']
final_result.to_csv('final_consensus.csv')#!/bin/bash
# Complete RAPTOR v2.2.0 workflow using CLI
# Step 1: QC
raptor qc --counts counts.csv --metadata metadata.csv --output qc_results/
# Step 2: Profile
raptor profile --counts counts.csv --metadata metadata.csv --group-column condition
# Step 3: Recommend
raptor recommend --profile profile.json --method ml
# Step 4: Import DE results
raptor import-de --input deseq2_results.csv --method deseq2 --output imported/
raptor import-de --input edger_results.csv --method edger --output imported/
raptor import-de --input limma_results.csv --method limma --output imported/
# Step 5: Optimize thresholds (NEW!)
raptor optimize --de-result imported/deseq2.csv --method fdr-control --fdr-target 0.05
# Step 6: Ensemble analysis (NEW!)
raptor ensemble-compare \
--deseq2 imported/deseq2.csv \
--edger imported/edger.csv \
--limma imported/limma.csv \
--output ensemble_results/
echo "✅ Complete! Check ensemble_results/ for consensus genes."| Module | Time | Memory | Key Output |
|---|---|---|---|
| Module 2: QC | 1-5 min | 4 GB | 6 methods consensus |
| Module 3: Profiler | 1-2 min | 4 GB | 32 features + BCV |
| Module 4: Recommender | <10 sec | <1 GB | ML recommendation |
| Module 8: Optimization | 5-30 min | 4 GB | Optimal thresholds |
| Module 9: Ensemble | <1 min | 2 GB | Consensus genes |
| Metric | Single Method | Ensemble (Brown's) |
|---|---|---|
| False Positive Rate | Higher | 33% lower |
| Reproducibility | Variable | Higher |
| Confidence | Method-specific | Consensus-based |
| Publication Impact | Good | Better |
We welcome contributions! RAPTOR is open-source and aims to make free science accessible to everyone.
# Fork and clone
git clone https://github.com/YOUR_USERNAME/RAPTOR.git
cd RAPTOR
# Create feature branch
git checkout -b feature/amazing-feature
# Make changes and test
pytest tests/
# Submit pull requestSee CONTRIBUTING.md for detailed guidelines.
- 🐛 Report bugs via Issues
- ✨ Request features
- 📝 Improve documentation
- 🔧 Submit pull requests
- 💡 Share use cases and feedback
- ⭐ Star the repository
If you use RAPTOR in your research, please cite:
@software{bolouki2026raptor,
author = {Bolouki, Ayeh},
title = {RAPTOR: RNA-seq Analysis Pipeline Testing and Optimization Resource},
year = {2026},
version = {2.2.0},
publisher = {Zenodo},
doi = {10.5281/zenodo.17607161},
url = {https://github.com/AyehBlk/RAPTOR}
}This project is licensed under the MIT License - see the LICENSE file for details.
MIT License
Copyright (c) 2026 Ayeh Bolouki
Ayeh Bolouki
- 🏛️ GIGA, University of Liège, Belgium
- 📧 Email: ayehbolouki1988@gmail.com
- 🐙 GitHub: @AyehBlk
- 🔬 Research: Computational Biology, Bioinformatics, Multi-omics Analysis
- 📖 Documentation: docs/
- 🐛 Issues: GitHub Issues
- 💬 Discussions: GitHub Discussions
- 📧 Email: ayehbolouki1988@gmail.com
RAPTOR builds on the excellent work of the RNA-seq community:
- Bioconductor community for the R package ecosystem
- DESeq2 (Love et al., 2014) - Differential expression analysis
- edgeR (Robinson et al., 2010) - Empirical analysis of DGE
- limma (Ritchie et al., 2015) - Linear models for microarray and RNA-seq
- Salmon (Patro et al., 2017) - Wicked-fast transcript quantification
- Kallisto (Bray et al., 2016) - Near-optimal probabilistic RNA-seq quantification
- STAR (Dobin et al., 2013) - Ultrafast universal RNA-seq aligner
- All users who provided feedback and suggestions
⭐ Star this repository if you find RAPTOR useful!
RAPTOR v2.2.0 - Making pipeline selection evidence-based, not guesswork 🦖
Making free science for everybody around the world 🌍