Skip to content

li-lab-mcgill/scGRIP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scGRIP: Deep Learning Single-Cell Gene Regulatory Inference with Prior Knowledge

scGRIP overview

Overview

Multi-omics sequencing technologies can jointly measure transcriptome and chromatin accessibility at the single-cell resolution. This enables inference of gene regulatory networks (GRNs) at the cellular level, thereby elucidating highresolution differential GRNs associated with diseases. However, existing methods lack interpretability and scalability. We present single-cell Gene Regulatory Inference with Prior knowledge (scGRIP). First, we treat transcription factors (TF), target genes (TG), and regulatory elements (RE) as nodes and their potential TF-RE and RE-TG interactions as edges using a prior cis-regulatory knowledge graph. Second, we tokenize the single-cell chromatin accessibility and gene expression with a shared codebook to compute cell-specific node embedding. Third, we incorporate a GraphSHAP technique to infer GRN edge attribution at the single-cell level. We benchmarked scGRIP against state-of-the-art methods, including LINGER and scGLUE, across multiple independent datasets. Our results demonstrate that scGRIP consistently outperforms existing approaches at three levels of inference: cell-specific, cell-type-specific, and condition-specific GRNs.

Expected Inputs

The code expects paired multiome data in AnnData/.h5ad format.

  • RNA counts: RNA_count.h5ad
  • ATAC counts: ATAC_count.h5ad
  • Cell labels in .obs, typically cell_type
  • Peak coordinates in atac.var, or peak names formatted like chr:start-end
  • Gene genomic coordinates in rna.var
  • cisTarget motif resources for TF-RE prior construction

Installation

pip install -r requirements.txt

Workflow

1. Build gene-RE links

preprocess/process_re_tg.py creates a gene-to-regulatory-element matrix using genomic proximity and optional correlation/GBM logic.

Key options:

  • --flag nearby|correlation|nearby+correlation|gbm|both
  • --distance
  • --distance_str
  • --top_n_genes

2. Add TF-RE links and assemble the graph

preprocess/process_tf_re.py combines:

  • gene-RE links,
  • TF-RE links derived from cisTarget motif scores,
  • optional gene-gene correlation edges,

into a final sparse GRN adjacency matrix.

3. Train Node2Vec on the graph

preprocess/train_n2v_sparse.py learns structural node embeddings from the GRN adjacency.

4. Train the graph-aware topic model

model/train_gnn_xtrimo.py is the main training script. It:

  • loads paired RNA/ATAC data,
  • loads the GRN adjacency,
  • optionally loads Node2Vec embeddings,
  • trains the GNN + topic model,
  • evaluates clustering quality,
  • saves checkpoints and UMAP/t-SNE plots.

Example Commands

The preprocessing scripts use relative paths internally, so the safest approach is to run them from inside their own directories after arranging the expected input files there.

Build gene-RE links:

cd preprocess
python process_re_tg.py --flag nearby --distance 1000000 --distance_str 1m --top_n_genes 3000

Assemble the GRN:

cd preprocess
python process_tf_re.py --flag nearby --distance 1000000 --distance_str 1m --top_n_genes 3000 --threshold 3

Train Node2Vec:

cd preprocess
python train_n2v_sparse.py --flag nearby --matrix_path ./processed/hvg_only_nearby_with_gene_gene_1m_threshold3_GRN.pkl

Train the model:

cd model
python train_gnn_xtrimo.py \
  --rna /path/to/RNA_count.h5ad \
  --atac /path/to/ATAC_count.h5ad \
  --adj_path /path/to/GRN.pkl \
  --node2vec_path /path/to/node_embeddings.pt \
  --emb_size 128 \
  --gnn_hidden 256 \
  --num_topics 100 \
  --batch_size 32 \
  --epochs 100 \
  --cell_type_col cell_type

Main Outputs

Depending on the stage, the pipeline writes:

  • processed ATAC subsets and sparse graph matrices under preprocess/processed/
  • Node2Vec embeddings under preprocess/processed/
  • model checkpoints under model/weights/
  • UMAP/t-SNE plots under model/plots/umap/

Python Dependencies

The main Python packages are listed in requirements.txt

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages