Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1
produce high-depth data and using the software Fastq_clean to
clean raw sequenced data with quality control. The software
STAR [29] needs to be used for paired-end alignment and quanti-
fication to produce the raw gene expression matrix. Sample reduc-
tion removes samples with nuclear RNA containing less than
100,000 read counts, and feature reduction (not necessary)
removes genes with PCCs less than 0.6 between their normalized
expression values and library sizes. Data normalization should be
conducted based on evaluation [30], and cluster analysis uses
t-SNE. Control samples are sequenced to identify the group of
CSCs from clusters, and control samples from public scRNA-seq
datasets are used to validate the group of CSCs.

Fig. 5Using scRNA-seq data to identify cancer stem cells. Samples containing
less than 100,000 read counts of nuclear RNA were filtered out without feature
selection. The ERCC-normalized gene expression matrix contained 665 samples
by 57,955 nuclear genes. A total of ten single cells from distal tissues (in red
color) as control and 655 single cells (in blue color) from colon tumor tissues
were selected to obtain a group of suspected cancer stem cells (in the green
circle). The calculation using the t-SNE method was performed with the package
Rt-SNE v0.11 on the R v3.3.2 platform. Input data used Euclidean distances
without PCA process. The parameters were set as (is_distance¼TRUE,
pca¼FALSE, perplexity¼12, theta¼0.5, dims¼2, max_iter¼3000) for
the t-SNE method

324 Shan Gao

Free download pdf