Computational Systems Biology Methods and Protocols.7z

produce high-depth data and using the software Fastq_clean to clean raw sequenced data with quality control. The software STAR [29] needs to be used for paired-end alignment and quanti- fication to produce the raw gene expression matrix. Sample reduction removes samples with nuclear RNA containing less than 100,000 read counts, and feature reduction (not necessary) removes genes with PCCs less than 0.6 between their normalized expression values and library sizes. Data normalization should be conducted based on evaluation [30], and cluster analysis uses t-SNE. Control samples are sequenced to identify the group of CSCs from clusters, and control samples from public scRNA-seq datasets are used to validate the group of CSCs.

Fig. 5Using scRNA-seq data to identify cancer stem cells. Samples containing less than 100,000 read counts of nuclear RNA were filtered out without feature selection. The ERCC-normalized gene expression matrix contained 665 samples by 57,955 nuclear genes. A total of ten single cells from distal tissues (in red color) as control and 655 single cells (in blue color) from colon tumor tissues were selected to obtain a group of suspected cancer stem cells (in the green circle). The calculation using the t-SNE method was performed with the package Rt-SNE v0.11 on the R v3.3.2 platform. Input data used Euclidean distances without PCA process. The parameters were set as (is_distance¼TRUE, pca¼FALSE, perplexity¼12, theta¼0.5, dims¼2, max_iter¼3000) for the t-SNE method

324 Shan Gao

Computational Systems Biology Methods and Protocols.7z

Get our desktop app

Company

Features

Documentation

Resources