scGRIP: Deep Learning Single-Cell Gene Regulatory Inference with Prior Knowledge

Overview

Multi-omics sequencing technologies can jointly measure transcriptome and chromatin accessibility at the single-cell resolution. This enables inference of gene regulatory networks (GRNs) at the cellular level, thereby elucidating highresolution differential GRNs associated with diseases. However, existing methods lack interpretability and scalability. We present single-cell Gene Regulatory Inference with Prior knowledge (scGRIP). First, we treat transcription factors (TF), target genes (TG), and regulatory elements (RE) as nodes and their potential TF-RE and RE-TG interactions as edges using a prior cis-regulatory knowledge graph. Second, we tokenize the single-cell chromatin accessibility and gene expression with a shared codebook to compute cell-specific node embedding. Third, we incorporate a GraphSHAP technique to infer GRN edge attribution at the single-cell level. We benchmarked scGRIP against state-of-the-art methods, including LINGER and scGLUE, across multiple independent datasets. Our results demonstrate that scGRIP consistently outperforms existing approaches at three levels of inference: cell-specific, cell-type-specific, and condition-specific GRNs.

Expected Inputs

The code expects paired multiome data in AnnData/.h5ad format.

RNA counts: RNA_count.h5ad
ATAC counts: ATAC_count.h5ad
Cell labels in .obs, typically cell_type
Peak coordinates in atac.var, or peak names formatted like chr:start-end
Gene genomic coordinates in rna.var
cisTarget motif resources for TF-RE prior construction

Installation

pip install -r requirements.txt

Workflow

1. Build gene-RE links

preprocess/process_re_tg.py creates a gene-to-regulatory-element matrix using genomic proximity and optional correlation/GBM logic.

Key options:

--flag nearby|correlation|nearby+correlation|gbm|both
--distance
--distance_str
--top_n_genes

2. Add TF-RE links and assemble the graph

preprocess/process_tf_re.py combines:

gene-RE links,
TF-RE links derived from cisTarget motif scores,
optional gene-gene correlation edges,

into a final sparse GRN adjacency matrix.

3. Train Node2Vec on the graph

preprocess/train_n2v_sparse.py learns structural node embeddings from the GRN adjacency.

4. Train the graph-aware topic model

model/train_gnn_xtrimo.py is the main training script. It:

loads paired RNA/ATAC data,
loads the GRN adjacency,
optionally loads Node2Vec embeddings,
trains the GNN + topic model,
evaluates clustering quality,
saves checkpoints and UMAP/t-SNE plots.

Example Commands

The preprocessing scripts use relative paths internally, so the safest approach is to run them from inside their own directories after arranging the expected input files there.

Build gene-RE links:

cd preprocess
python process_re_tg.py --flag nearby --distance 1000000 --distance_str 1m --top_n_genes 3000

Assemble the GRN:

cd preprocess
python process_tf_re.py --flag nearby --distance 1000000 --distance_str 1m --top_n_genes 3000 --threshold 3

Train Node2Vec:

cd preprocess
python train_n2v_sparse.py --flag nearby --matrix_path ./processed/hvg_only_nearby_with_gene_gene_1m_threshold3_GRN.pkl

Train the model:

cd model
python train_gnn_xtrimo.py \
  --rna /path/to/RNA_count.h5ad \
  --atac /path/to/ATAC_count.h5ad \
  --adj_path /path/to/GRN.pkl \
  --node2vec_path /path/to/node_embeddings.pt \
  --emb_size 128 \
  --gnn_hidden 256 \
  --num_topics 100 \
  --batch_size 32 \
  --epochs 100 \
  --cell_type_col cell_type

Main Outputs

Depending on the stage, the pipeline writes:

processed ATAC subsets and sparse graph matrices under preprocess/processed/
Node2Vec embeddings under preprocess/processed/
model checkpoints under model/weights/
UMAP/t-SNE plots under model/plots/umap/

Python Dependencies

The main Python packages are listed in requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
figures		figures
model		model
preprocess		preprocess
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scGRIP: Deep Learning Single-Cell Gene Regulatory Inference with Prior Knowledge

Overview

Expected Inputs

Installation

Workflow

1. Build gene-RE links

2. Add TF-RE links and assemble the graph

3. Train Node2Vec on the graph

4. Train the graph-aware topic model

Example Commands

Main Outputs

Python Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

scGRIP: Deep Learning Single-Cell Gene Regulatory Inference with Prior Knowledge

Overview

Expected Inputs

Installation

Workflow

1. Build gene-RE links

2. Add TF-RE links and assemble the graph

3. Train Node2Vec on the graph

4. Train the graph-aware topic model

Example Commands

Main Outputs

Python Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages