Computational Systems Biology Methods and Protocols.7z

expression of mRNA or noncoding RNA, both of which show important roles in tumor biology; meanwhile, fusion gene can also be identified by using RNA-seq [130].

4.2 Bioinformatics
Analyses for NGS

Along with the evolution of NGS technologies, the size of genomic data grows sharply. It is estimated that ~15 petabytes sequencing data are generated every year [131], interpretation of which remains a big challenge. The crux of processing sequencing data mainly lies in the accurate alignment of reads where trade-off between accuracy and efficiency should be considered. At present, plenty of alignment software like BWA [132], Bowtie [133], and SOAP2 [134] which are based on “Burrows-Wheeler transform (BWT)” algorithm are widely used. BWT could compress the ref- erence effectively by building FM-index [135]. Although BWT-based software could bring fast alignment, its accuracy was compromised. In contrast, those hash-based algorithm such as Novoalign and MAQ [136] could provide higher accuracy in the price of longer running time. The accurate alignment would be a good start for processing NGS data, but the subsequent analysis of molecular signatures such as SNVs (single-nucleotide variations), CNVs (copy number variations), and SVs (structural variations) was also crucial for the precise interpretation. In the process of SNVs calling, Bayes’ mode was introduced for evaluating quality value (QPhred) of single base. Meanwhile the probabilistic frame- work was used for genotyping with the information of allele fre- quencies and linkage disequilibrium was considered in the calculation [137, 138]. GATK [139], SAMtools [140], and SOAP [134] and other genotyping software were broadly applied for calling SNVs. For CNV detection, HMM (hidden Markov model) and sliding window were the two main algorithms for software including CNAseg [141], SegSeq [142], and JointSLM [143]. Recently CONSERTING, a software based on regression tree algorithm, was also developed for calling CNVs and exhibits high sensitivity and accuracy [144]. Besides CNVs, structural variations including insertions, deletions, inversions, and duplications were considered as important molecular signatures. The popular algorithms of detecting SVs could be categorized into four groups: read-pair algorithm like BreakDancer [145]; split-read algorithm like CREST [146], Breakpointer [147], and BreakSeek [148]; read-depth algorithm like GAVPro [149] and GenomeSTRiP [150]; and assembly algorithm like BreaKmer [151], NovelSeq [152], and SOAPindel [59]. Of course, the combinations of two or more algorithms can be used for improving the sensitivity and specificity of SV detection. Noteworthy, the usage of long-read sequencing platform like PacBio could provide remarkable facilita- tion in SV calling.

402 Heng Xu and Yang Shu

Computational Systems Biology Methods and Protocols.7z

Get our desktop app

Company

Features

Documentation

Resources