Computational Systems Biology Methods and Protocols.7z

analysis includes raw data cleaning with quality control [3], read alignment, generation of read counts, normalization, data filtering with quality control, and downstream analyses. Currently, a major and popular application of downstream analyses is to identify cell types or states using cluster analysis. Although many existing tools for processing bulk RNA-seq data can be used to process scRNA- seq data with or without modification, scRNA-seq data analysis poses several unique computational challenges that necessitate the development of entirely new analytical methods. In this chapter, we focus on the introduction and discussion of the research status in the field of scRNA-seq data normalization (Subheading3) and cluster analysis (Subheading5), which are the two most important challenges in the scRNA-seq data analysis. We also present a schema to generalize four fundamental problems (Subheading4). Preliminary results from our previous studies of these problems are provided to give directions for researchers in their future studies. Particularly, we present a protocol to discover and validate cancer stem cells (CSCs), which was first implemented by Lin Liu et al. using a colon cancer scRNA-seq dataset (Subhead- ing2).

2 Experiment Design and Data Quality Control

Six commonly used scRNA-seq protocols are CEL-seq2, Drop-seq, MARS-seq, SCRB-seq, Smart-seq, and Smart-seq2 (Table1), the performances of which have been evaluated in a comparative study [4]. These performances included sensitivity (i.e., the probability to capture and convert a particular mRNA transcript present in a single cell into a cDNA molecule present in the library), accuracy (i.e., how well the read quantification corresponds to the actual concentration of mRNA), precision (i.e., the technical variation of the quantification), cost, etc. The authors of the comparative study concluded that Smart-seq2 was the most sensitive and accurate protocol with a similar cost efficiency to five other protocols. In this chapter, we demonstrate all the research results from our previous studies using a colon cancer scRNA-seq dataset (SRA: SRP113436) provided by Lin Liu et al. This dataset includes 831 - single-cell samples and 18 bulk samples using the Smart-seq2 scRNA-seq protocol. The 831 single-cell samples are 814 single cells from colon tumor tissues and 17 single cells from distal tissues (>10 cm) as control. The 18 bulk samples are nine samples from colon tumor tissues and nine samples from distal tissues. Besides six protocols, the scRNA-seq experiment design needs to consider other factors such as sequencing length and depth. The sequence length determines the alignment quality and then affects the accuracy of quantitative analysis. In addition, paired-end (PE) reads have advantages over single-end (SE) reads for genome

312 Shan Gao

Computational Systems Biology Methods and Protocols.7z

Get our desktop app

Company

Features

Documentation

Resources