Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1
on the clustering results. Cluster analysis is also used to identify cell
states, particularly their stages of the cell cycle. The identification of
cells by cluster analysis needs to be validated by the known gene
signatures or biomarkers that best distinguish cell types or states.
These gene signatures were usually discovered using bulk data.
Therefore, the cluster analysis results using scRNA-seq data can
be used to confirm these gene signatures at the single-cell level.
Cluster analysis, as an unsupervised method, can be used to study
other biological topics by its combined use with other prior knowl-
edge. For example, the Markov random field (MRF)-based method
has been developed to cluster cells using both spatial and expression
information [27].
Although scRNA-seq has been successively used to reveal the
tissue heterogeneity or discover some new cell types, it has not been
used to solve any basic problem or validate any hypothesis well in
biological sciences. One well-known and debatable hypothesis is
the existence of cancer stem cells (CSCs) that are responsible for
tumor initiation and growth, possessing properties such as indefi-
nite self-renewal, slow replication, intrinsic resistance to chemo-
therapy and radiotherapy, and an ability to give rise to
differentiated progeny [28]. Here, we present a protocol to dis-
cover and validate CSCs using scRNA-seq. This protocol uses
Smart-seq2 150 bp PE sequencing. Sample reduction removes
samples with nuclear RNA containing less than 100,000 read
counts, and feature reduction (not necessary) removes genes with
PCCs less than 0.6 between their normalized expression values and
library sizes. The ERCC-normalized data are used to produce
clusters by the t-SNE method. Control samples are sequenced to
identify the group of CSCs from clusters. Finally, control samples
from public scRNA-seq datasets are used to validate the group of
CSCs. This protocol was first applied on the colon cancer scRNA-
seq dataset (Subheading2) to identify a special group of cells
comprising 4.73% (31/655) of all the single cells from tumor
tissues (Fig.5). This group did not contain any control sample,
while other groups contained at least one control sample. In addi-
tion, the cells in this group were CD133-positive (CD133+).
Therefore, these results suggested that this group of cells could
be CSCs. Finally, more control samples from public datasets were
used to validate these CSCs.

6 Discussion


Single-cell transcriptome sequencing, often referred to as single-
cell RNA sequencing (scRNA-seq), is a powerful tool to investigate
cell types, states, and its dynamics. The successful application of
scRNA-seq needs a careful experiment design and data analysis. We
suggest using Smart-seq2 protocol with PE 150 bp sequencing to

Data Analysis in Single-Cell Transcriptome Sequencing 323
Free download pdf