Computational Systems Biology Methods and Protocols.7z

put and decreasing cost. The rise of RNA-seq methodologies has greatly deepened our understandings of embryonic development [2], carcinogenesis [3], cell differentiation [4], and many other research areas.

1.1 An Overview of
RNA-seq Workflow

A complete RNA-seq procedure consists of both experimental stage and analysis stage. Although several sequencing protocols exist for RNA-seq, general steps and outputs in the experimental stage are similar. Briefly, RNA molecules with poly-A tails are first isolated by oligo-dT priming [5]. Alternatively, non-rRNAs are enriched by rRNA depletion [6]. The resulting RNAs are fragmen- ted and then reverse-transcribed into short (200–1000 bp) cDNA fragments, which are then attached with sequencing adaptors and sequenced from one end or both ends. Several NGS technologies, including Illumina [7] and SOLiD [8], can be used for RNA-seq to generate millions or billions of short reads representing DNA segments. The analysis stage of RNA-seq begins with mapping reads to the reference genome. Because eukaryotic genomes contain introns, RNA-seq reads often have gaps with varying lengths up to hundreds of thousands of base pairs, which make DNA sequence mapping tools generally unsuitable for direct use in RNA-seq. Widely used RNA-seq mapping tools include Tophat [9], SOAP [10], and GSNAP [11]. There are also programs that map reads onto a reference transcriptome, rather than reference genome, to circumvent the gap problem and to reduce computation time, such as Sailfish [12] and Kallisto [13]. Following read mapping is the quantification of each RNA species that are either provided by the reference transcriptome or de novo assembled from reads. Most mapping software packages also perform the quantification step. Generally, the final output of mapping and quantification steps can be described as a matrix with each column being a sample and each row being a gene or a splicing isoform of a transcript. This matrix, often called the expression profile, is the starting point of down- stream analysis of RNA-seq datasets. The expression profile contains rich transcriptomic information regarding the tested samples. How to draw biological meanings from it, however, is highly contingent on the specific research background. A one-size-fits-all analytical workflow does not exist. For example, different normalization methods have been proposed to alleviate technical variations and batch effects among samples, each with its strengths and drawbacks [14]. Theoretically predict- ing which method will give best results is a challenging or some- times impossible task [14]. RNA-seq data are often analyzed by clustering methods to discover co-expressed gene groups or sample subclasses that share expression patterns. Commonly used clustering algorithms, including hierarchical clustering, principal

168 Chao Zhang et al.

Computational Systems Biology Methods and Protocols.7z

Get our desktop app

Company

Features

Documentation

Resources