estimates. This applies to any downstream analysis but is particu-
larly important when comparing expression levels between cells or
when assessing the variability of individual genes. Because of the
typically low capture efficiency of current scRNA-seq protocols,
even moderately expressed genes are frequently undetected. Con-
sequently, methods to accurately estimate the extent of this techni-
cal variability are crucial in order to differentiate between genuine
gene expression changes and experimental artefacts.
The use of spike-ins as control genes is appealing, since the
same amount of ERCC (or other) spike-in was added to each cell in
our experiment. In principle, all the variability we observe for these
genes is due to technical noise, whereas endogenous genes are
affected by both technical noise and biological variability. Technical
noise can be removed by fitting a model to the spike-ins and
“substracting” this from the endogenous genes. There are several
methods available based on this premise (e.g., BASiCS [130, 131],
scLVM [79], RUVg [132]), each using different noise models and
different fitting procedures. Alternatively, one can identify genes
which exhibit significant variation beyond technical noise (e.g.,
distance to median, highly variable genes). However, there are
issues with the use of spike-ins for normalization (particularly
ERCCs, derived from bacterial sequences), including that their
variability can, for various reasons, actually be higher than that of
endogenous genes.
Given the issues with using spike-ins, better results can often be
obtained by using endogenous genes instead. Where we have a
large number of endogenous genes that, on average, do not vary
systematically between cells and where we expect technical effects
to affect a large number of genes (a very common and reasonable
assumption), then such methods (e.g., the RUVs method [132])
can perform well. Although almost all analyzing approaches take
the technical noise into account, the readers are recommended to
refer to the most common strategies used in refs.88, 130 for
details.
2.4 Getting
Biological Insights into
Single-Cell RNA
Sequencing
In this section, we will give a review of the applications of scRNA-
seq to the specific biological questions for which bulk RNA-seq
may not give the answers. Thus far, single-cell RNA sequencing has
already shown great effectiveness in unraveling complex cell popu-
lations, reconstructing developmental trajectories, modeling tran-
scriptional dynamics, and so on. In all the following analyses, we
assume that the input data is a matrix of gene expression or tran-
script counts that have been normalized and cleared the technical
visibilities using whatever approaches described above.
2.4.1 Accounting for
Heterogeneity: Cell Identity
and Cellular State
Solid tissues of human and other eukaryotes comprise of several
different types of cells. These different cell types have distinct
transcriptomic profiles. Although there are a lot of computational
Applications of Single-Cell Sequencing for Multiomics 355