Skip to content

RabadanLab/HemaScribe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HemaScribe

This package contains two main functions to analyze single cell RNA-sequencing data from the mouse hematopoietic system: HemaScribe and HemaScape. HemaScribe annotates cell types in the mouse bone marrow with a focus on hematopoietic stem and progenitor cells (HSPCs). It performs multilevel annotation using a hierarchy of classifiers trained on publicly available bulk RNA-sequencing data and newly generated single-cell RNA-sequencing data. HemaScape performs trajectory analysis by mapping query cells to a pre-built DensityPath trajectory reference using the Symphony algorithm, enabling Elastic Embedding (EE) projection and the prediction of branch assignments, density clusters, refined density clusters along each trajectory branch, and pseudotime. For more details, please see our preprint on bioRxiv:

James W. Swann, Jun Hou Fung, Ziwei Chen, et al. Quantitative molecular cartography of emergency myelopoiesis reveals conserved modules of hematopoietic activation. bioRxiv 2025.05.28.656712; doi: https://doi.org/10.1101/2025.05.28.656712

The basic outputs of HemaScribe annotation are:

  • A per-cell “hematopoietic score” that indicates whether the cell (coming from a bone marrow sample) is hematopoietic or not,

  • A “broad” classifier that annotates the major hematopoietic populations (HSPC, erythroid, myeloid, immune, etc.),

  • A “fine” classifier that classifies HSPCs into HSCs, MPP2/3/4s, and other early progenitor cell types,

  • A “combined” summary of the outputs of the broad and fine classifiers,and

  • A HSPC-focused annotation using the hash labels from the new scRNA-seq data, as well as a GMP-focused annotation.

HemaScribe annotation is compatible with both Seurat v5 and SingleCellExperiment data types.

The basic outputs of HemaScape trajectory mapping include:

  • Predicted query EE coordinates,

  • Predicted query branch assignment with corresponding probabilities,

  • Predicted query density cluster and refined density clusters along each trajectory branch with corresponding probabilities, and

  • Predicted pseudotime based on the predicted EE coordinates and the reference density landscape.

Installation

HemaScribe requires the following non-CRAN packages to run: scater, SingleR, UCell, AnnotationDbi, and org.Mm.eg.db. Please ensure that they are available prior to installing HemaScribe. These packages can be installed from Bioconductor:

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("scater")
BiocManager::install("SingleR")
BiocManager::install("UCell")
BiocManager::install("AnnotationDbi")
BiocManager::install("org.Mm.eg.db")

Afterwards, you can install the current version of HemaScribe from GitHub with:

# install.packages("devtools")
devtools::install_github("RabadanLab/HemaScribe")

When the package is installed, reference data files will be automatically downloaded from the latest release. This is required for the package to function.

Example usage

Example 1: HemaScribe Annotation

We use HemaScribe to annotate a single cell dataset of lineage-/c-Kit+ (LK) hematopoietic progenitors generated in our lab by Collins et al. (2024).

library(HemaScribe)
suppressPackageStartupMessages({
  library(Seurat)
  library(scuttle)
})

# Download Collins et al. (2024) data from GEO.
options(timeout = 1000)
collins2024.seurat <- ReadMtx(
  mtx = "https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM7712015&format=file&file=GSM7712015%5FAA003%5Fmatrix%2Emtx%2Egz",
  cells = "https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM7712015&format=file&file=GSM7712015%5FAA003%5Fbarcodes%2Etsv%2Egz",
  features = "https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSM7712015&format=file&file=GSM7712015%5FAA003%5Ffeatures%2Etsv%2Egz"
)

collins2024.seurat <- CreateSeuratObject(collins2024.seurat, project="Collins2024")

# It is important to log-normalize the data before calling HemaScribe().
collins2024.seurat <- NormalizeData(collins2024.seurat, verbose = FALSE)

# Run HemaScribe.
collins2024.seurat <- HemaScribe(collins2024.seurat, return.full = FALSE)
#> Calculating hematopoietic scores
#> Classifying into broad cell types
#> Classifying into fine cell subtypes
#> Returning final annotations

head(collins2024.seurat$HSPC.annot)
#> AAACCCAAGCCTGTCG-1 AAACCCAAGCTCACTA-1 AAACCCAAGTATGTAG-1 AAACCCACAAATTGGA-1 
#>              "GMP"              "GMP"             "MPP2"             "MPP4" 
#> AAACCCACAATTTCGG-1 AAACCCATCACGTCCT-1 
#>             "EryP"              "GMP"

Normally, HemaScribe annotation should be run on processed data, after quality control filtering, etc., which we have omitted here for brevity. The only preprocessing step we insist is log-normalization. With return.full = TRUE, HemaScribe produces a separate report of all the classifier results. If return.full = FALSE, a Seurat object is returned with the metadata populated by the classifier results instead, which may be useful for downstream analysis.

The classifiers can also be run individually. See the documentation for more details.

Example 2: HemaScape trajectory mapping

Next, we apply HemaScape to perform trajectory mapping on the dataset.

# Run HemaScape.
# If needed, a key indicating the sequencing batch can be supplied to the `vars` variable.
collins2024.seurat <- HemaScape(collins2024.seurat)
#> Predict EE coordinates
#> Predict branch assignment
#> Predict density cluster
#> Predict refined density clusters along each trajectory branch
#> Predict pseudotime
#> Returning final mapping results

We can now visualize the outputs of HemaScribe and HemaScape.

# Inspect the results

## View HemaScribe-annotated clusters
cols <- c("HSC" = "navy", "STHSC" = "mediumturquoise", "MPP2" = "blue", 
          "FcG_neg_MPP3" = "deeppink", "FcG_pos_MPP3" = "darkmagenta", 
          "MPP4" = "gold", "GMP" = "seagreen", "MkP" = "lightgreen",
          "EryP" = "tomato", "CLP" = "slateblue", "NotHSPC" = "grey87")
collins2024.seurat$HSPC.annot <- factor(collins2024.seurat$HSPC.annot, levels=names(cols))
DimPlot(collins2024.seurat, reduction = "EE", group.by = "HSPC.annot", cols = cols)

## View predicted branch assignments on predicted EE embedding
DimPlot(collins2024.seurat, reduction = "EE", group.by = "branch_pred")

## View predicted density clusters on predicted EE embedding
DimPlot(collins2024.seurat, reduction = "EE", group.by = "density_cluster_pred")

## View predicted pseudotime on predicted EE embedding
FeaturePlot(collins2024.seurat,  reduction = "EE", features = "pseudotime_pred")
#> Warning: The `slot` argument of `FetchData()` is deprecated as of SeuratObject 5.0.0.
#> ℹ Please use the `layer` argument instead.
#> ℹ The deprecated feature was likely used in the Seurat package.
#>   Please report the issue at <https://github.com/satijalab/seurat/issues>.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.

Like HemaScribe, HemaScape mapping should be performed on processed data after quality control filtering and other preprocessing steps.

Citation

If you use our package, please cite our preprint on bioRxiv.

About

HemaScribe: tools for analyzing single cell mouse hematopoiesis data

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages