Single-cell RNA sequencing (scRNA-seq) methods are widely used to gather and infer information about gene expression and transcriptional states of cells. However, cell-free RNA in the solution, known as ambient noise, can create systematic biases and batch effects in the form of contamination in expression levels between different cell populations.
The main assumption underline the noise estimation is that a set of single cell RNA profiles are observed as a mixture of true contribution from cells and batch-specific contribution from ambient noise. The batch-specific noise is similar for all cells in each batch, mixing molecules sampled from one batch-specific multinomial distribution (the ambient distribution) with the true molecules. Inferring the mixture of molecules for a single cell is difficult, but when grouping cells into metacells, and assuming all cells within a metacell are sampled from the same true distribution, we can write down the expected number of UMIs from each gene.
The package implements a method estimate scRNA-seq ambient noise levels in different batches and removal pipeline base on the metacells framework.
To identifying noise prone features we perform a two-way clustering over
the metacells which allows us to find noise prone genes in each cluster.
This process assumes genes cluster which drastically expressed very
little in a metacell cluster is mostly being generated by noise. This
allow us to take those metacells-genes combination (named
Pgm - Pair of Genes Metacells) and try to deconvolve the observed
umis to two parts: umis which generated from noise and umis which
generated from native true expression of this gene module in those
metacells (named Egt - Expressed Genes in Type). The key point is
that different pairs of metacells and genes clusters should
have the same noise estimation if they share the same batch, while the
same pair of metacell and gene cluster should have the same Egt even
between batches. Once we have those candidates we can use them to
estimate the noise levels in each batch. The ambient noise package
estimate the noise of a cell in three different levels: the first one is
based on umis count, the second is clusters of metacells and highly
differential genes and the third level is batch specific.
After completing the noise estimation process we can move to the next step, correction of the metacell model.
The first step is to re-create the metacell model with noise estimation data, this will change the cell-cell distances which in turn determine the derivation of metacells. In general, this allow for re-mixing of the cells by removing the effects of the batches those cells originated from.
The first step doesn’t change the umis count of the cells or the umi count of the metacells. To actually remove those noisy umis we will do the second step. This step remove umis which are likely generated from the noise. This is being done only in the metacell level.
The package heavily depend on two parts:
- The python package metacells framework.
2. The R library zetadiv.
Sadly there is no pure python equivalent to zetadiv library which currently mean that there is a need to run R code from python.
To install the MCNoise you can run:
The generated documentation contains the a vignette over over hourigan human bone marrow data.
Copyright © 2020, 2021, 2022 Weizmann Institute of Science
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.