Skip to content

tanaylab/MCNoise

Repository files navigation

MCNoise

Single-cell RNA sequencing (scRNA-seq) methods are widely used to gather and infer information about gene expression and transcriptional states of cells. However, cell-free RNA in the solution, known as ambient noise, can create systematic biases and batch effects in the form of contamination in expression levels between different cell populations.

The main assumption underline the noise estimation is that a set of single cell RNA profiles are observed as a mixture of true contribution from cells and batch-specific contribution from ambient noise. The batch-specific noise is similar for all cells in each batch, mixing molecules sampled from one batch-specific multinomial distribution (the ambient distribution) with the true molecules. Inferring the mixture of molecules for a single cell is difficult, but when grouping cells into metacells, and assuming all cells within a metacell are sampled from the same true distribution, we can write down the expected number of UMIs from each gene.

The package implements a method estimate scRNA-seq ambient noise levels in different batches and removal pipeline base on the metacells framework.

Ambient noise estimation

To identifying noise prone features we perform a two-way clustering over the metacells which allows us to find noise prone genes in each cluster. This process assumes genes cluster which drastically expressed very little in a metacell cluster is mostly being generated by noise. This allow us to take those metacells-genes combination (named Pgm - Pair of Genes Metacells) and try to deconvolve the observed umis to two parts: umis which generated from noise and umis which generated from native true expression of this gene module in those metacells (named Egt - Expressed Genes in Type). The key point is that different pairs of metacells and genes clusters should have the same noise estimation if they share the same batch, while the same pair of metacell and gene cluster should have the same Egt even between batches. Once we have those candidates we can use them to estimate the noise levels in each batch. The ambient noise package estimate the noise of a cell in three different levels: the first one is based on umis count, the second is clusters of metacells and highly differential genes and the third level is batch specific.

Ambient noise removal

After completing the noise estimation process we can move to the next step, correction of the metacell model.

The first step is to re-create the metacell model with noise estimation data, this will change the cell-cell distances which in turn determine the derivation of metacells. In general, this allow for re-mixing of the cells by removing the effects of the batches those cells originated from.

The first step doesn’t change the umis count of the cells or the umi count of the metacells. To actually remove those noisy umis we will do the second step. This step remove umis which are likely generated from the noise. This is being done only in the metacell level.

Installation

The package heavily depend on two parts:

  1. The python package metacells framework.

2. The R library zetadiv.

Sadly there is no pure python equivalent to zetadiv library which currently mean that there is a need to run R code from python.

To install the MCNoise you can run:

Vignettes

The generated documentation contains the a vignette over over hourigan human bone marrow data.

References

License(MIT)

Copyright © 2020, 2021, 2022 Weizmann Institute of Science

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

About

scRNA-seq ambient noise estimation and removal pipeline base on Metacell

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors