methods aiming at inferring the heterogeneity based on sequencing
data from bulk samples [133], single-cell transcriptomics provide
valuable ability to characterize a sample in terms of the known and
novel cell types it contains, i.e., the heterogeneity [40, 79,
134 –138].
scRNA-seq is useful for cell type identification by clustering
cells on the basis of their expression profiles. Distinct subsets,
potentially corresponding to unknown cell types, can be identified.
Especially, the genes that best distinguish different cell types can
characterize them. There are two types of clustering methods for
cell type identification based on scRNA-seq data, depending on
whether there is established information or expectation regarding
the relationship between these cells. If there is no prior expectation,
unbiased or unsupervised clustering methods, such as hierarchical
clustering or PCA-like methods, can be used to group cells accord-
ing to their individual expression profile. For example, Trapnell
et al. use a PCA-based approach to group cells according to their
position along the differentiation cascade [30], generating the
developmental trajectory. This approach was implemented as a
stand-alone tool for public uses, called Monocle (http://cole-trap
nell-lab.github.io/monocle-release/). If the prior information is
available, PCA-like approach can be combined with knowledge of
the expression patterns of a small set of known marker genes,
allowing an approximate spatial map of the tissue under study to
be obtained [135].
In addition to cell type identification, unsupervised methods
such as PCA can also be used to explore cellular state, for example,
stage or speed of the cell cycle. Perhaps counterintuitively, slow-
cycling cells tend to have clearer transcriptional signatures of G1/S
versus G2/M stages, whereas fast-cycling cells tend to be more
homogeneous with respect to expression of cell cycle genes. A
recent study of single cells obtained from glioblastomas describes
a computational strategy for quantifying the speed of the cell cycle
in each cell by comparing expression levels of G1/S versus G2/M
genes [139].
2.4.2 Differential
Expression and Alternative
Splicing
Differential expressed gene (DEG) detection is the most useful and
first application to distinguish the different bulk samples or distinct
cluster of single cells based on their transcriptomic profiles. From
computational perspective, approaches based on standard differen-
tial expression tools for bulk RNA-seq can be used [140–144], of
which Rapaport et al. gave a comprehensive review and evaluation
[145]. However, care must be taken that scRNA-seq data are
typically noisier than bulk RNA-seq, so the technical variability
must be characterized and accounted for before DEG analysis.
Recently, alternative approaches [83, 86, 88–92, 94, 111, 124]
designed specifically for scRNA-seq data have to be developed
356 Yungang Xu and Xiaobo Zhou