Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1

2.2.2 Normalization of
scRNA-seq Data with UMIs


To correct for amplification bias, the UMIs have been described
how molecules can be directly counted [34]. For scRNA-seq,
UMIs have been used as an internal validation control and also
been explored as a direct, quantitative measure of gene expression.
When UMIs are used, and assuming that the sample is sequenced at
a sufficient depth to ensure that each cDNA molecule is observed at
least once, thus the number of UMIs linked to each gene is a direct
measure of the number of cDNA molecules associated with that
gene. Therefore, it is tempting to use these raw molecular counts—
which, unlike the expression estimates from non-UMI protocols,
are independent of amplification biases—in all downstream analy-
sis. However, differences in the number of UMIs associated with
each gene still can vary between cells for technical rather than
biological reasons, such as differences in efficiency of the reverse
transcription and other cell-specific technical effects independent of
amplification.
One strongly recommended approach that can help to over-
come this problem is to combine the spike-in control with the
UMIs. As described above, the number of spike-in molecules is
theoretically the same across cells, and the variability in the number
of UMIs associated with spike-in genes across cells is indicative of
differences in amplification efficiency, that is, technical variability.
Consequently, a spike-in-based normalization step can be used to
convert the number of cDNA molecules (the number of associated
UMIs) to the number of mRNA molecules. As the UMI-based
scRNA-seq protocols currently sequence only a fragment of each
molecule (from either the 5^0 or the 3^0 end of the transcript),
normalization on transcript length is unnecessary. However, these
tag-based protocols cannot be used to study isoform variants or
allele-specific expression. Most recently, the Census algorithm
[124] was introduced to convert conventional measures of relative
expression such as transcript per million (TPM) in single cells to
relative transcript counts without the need for spike-ins or UMIs.
“Census counts” eliminate much of the apparent technical varia-
bility in single-cell experiments and are thus easier to model with
standard regression techniques than normalized read counts. The
authors claimed Census counts could reveal changes in cell fate-
dependent gene expression, splicing patterns, and allelic imbal-
ances. Census is implemented in Monocle 2 [30], the second
major release of the open-source single-cell toolkit (https://
github.com/cole-trapnell-lab/monocle-release).

2.3 Accounting for
Technical Noise in
scRNA-Seq
Experiments


Technical noise is unavoidable, especially for the scRNA-seq experi-
ments, owing to the low amount of starting material and over-
amplification, and it must be quantified in order to avoid
mistaking it for genuine differences in biological expression levels.
Once normalized gene expression levels or molecular counts have
been generated, it is important to incorporate technical variability

354 Yungang Xu and Xiaobo Zhou

Free download pdf