Computational Systems Biology Methods and Protocols.7z

2.2.2 Normalization of
scRNA-seq Data with UMIs

To correct for amplification bias, the UMIs have been described how molecules can be directly counted [34]. For scRNA-seq, UMIs have been used as an internal validation control and also been explored as a direct, quantitative measure of gene expression. When UMIs are used, and assuming that the sample is sequenced at a sufficient depth to ensure that each cDNA molecule is observed at least once, thus the number of UMIs linked to each gene is a direct measure of the number of cDNA molecules associated with that gene. Therefore, it is tempting to use these raw molecular counts— which, unlike the expression estimates from non-UMI protocols, are independent of amplification biases—in all downstream analy- sis. However, differences in the number of UMIs associated with each gene still can vary between cells for technical rather than biological reasons, such as differences in efficiency of the reverse transcription and other cell-specific technical effects independent of amplification. One strongly recommended approach that can help to over- come this problem is to combine the spike-in control with the UMIs. As described above, the number of spike-in molecules is theoretically the same across cells, and the variability in the number of UMIs associated with spike-in genes across cells is indicative of differences in amplification efficiency, that is, technical variability. Consequently, a spike-in-based normalization step can be used to convert the number of cDNA molecules (the number of associated UMIs) to the number of mRNA molecules. As the UMI-based scRNA-seq protocols currently sequence only a fragment of each molecule (from either the 5^0 or the 3^0 end of the transcript), normalization on transcript length is unnecessary. However, these tag-based protocols cannot be used to study isoform variants or allele-specific expression. Most recently, the Census algorithm [124] was introduced to convert conventional measures of relative expression such as transcript per million (TPM) in single cells to relative transcript counts without the need for spike-ins or UMIs. “Census counts” eliminate much of the apparent technical variability in single-cell experiments and are thus easier to model with standard regression techniques than normalized read counts. The authors claimed Census counts could reveal changes in cell fate- dependent gene expression, splicing patterns, and allelic imbal- ances. Census is implemented in Monocle 2 [30], the second major release of the open-source single-cell toolkit (https:// github.com/cole-trapnell-lab/monocle-release).

2.3 Accounting for
Technical Noise in
scRNA-Seq
Experiments

Technical noise is unavoidable, especially for the scRNA-seq experiments, owing to the low amount of starting material and over- amplification, and it must be quantified in order to avoid mistaking it for genuine differences in biological expression levels. Once normalized gene expression levels or molecular counts have been generated, it is important to incorporate technical variability

354 Yungang Xu and Xiaobo Zhou

Computational Systems Biology Methods and Protocols.7z

Get our desktop app

Company

Features

Documentation

Resources