Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1
expression of mRNA or noncoding RNA, both of which show
important roles in tumor biology; meanwhile, fusion gene can
also be identified by using RNA-seq [130].

4.2 Bioinformatics
Analyses for NGS


Along with the evolution of NGS technologies, the size of genomic
data grows sharply. It is estimated that ~15 petabytes sequencing
data are generated every year [131], interpretation of which
remains a big challenge. The crux of processing sequencing data
mainly lies in the accurate alignment of reads where trade-off
between accuracy and efficiency should be considered. At present,
plenty of alignment software like BWA [132], Bowtie [133], and
SOAP2 [134] which are based on “Burrows-Wheeler transform
(BWT)” algorithm are widely used. BWT could compress the ref-
erence effectively by building FM-index [135]. Although
BWT-based software could bring fast alignment, its accuracy was
compromised. In contrast, those hash-based algorithm such as
Novoalign and MAQ [136] could provide higher accuracy in the
price of longer running time. The accurate alignment would be a
good start for processing NGS data, but the subsequent analysis of
molecular signatures such as SNVs (single-nucleotide variations),
CNVs (copy number variations), and SVs (structural variations)
was also crucial for the precise interpretation. In the process of
SNVs calling, Bayes’ mode was introduced for evaluating quality
value (QPhred) of single base. Meanwhile the probabilistic frame-
work was used for genotyping with the information of allele fre-
quencies and linkage disequilibrium was considered in the
calculation [137, 138]. GATK [139], SAMtools [140], and
SOAP [134] and other genotyping software were broadly applied
for calling SNVs. For CNV detection, HMM (hidden Markov
model) and sliding window were the two main algorithms for
software including CNAseg [141], SegSeq [142], and JointSLM
[143]. Recently CONSERTING, a software based on regression
tree algorithm, was also developed for calling CNVs and exhibits
high sensitivity and accuracy [144]. Besides CNVs, structural var-
iations including insertions, deletions, inversions, and duplications
were considered as important molecular signatures. The popular
algorithms of detecting SVs could be categorized into four groups:
read-pair algorithm like BreakDancer [145]; split-read algorithm
like CREST [146], Breakpointer [147], and BreakSeek [148];
read-depth algorithm like GAVPro [149] and GenomeSTRiP
[150]; and assembly algorithm like BreaKmer [151], NovelSeq
[152], and SOAPindel [59]. Of course, the combinations of two
or more algorithms can be used for improving the sensitivity and
specificity of SV detection. Noteworthy, the usage of long-read
sequencing platform like PacBio could provide remarkable facilita-
tion in SV calling.

402 Heng Xu and Yang Shu

Free download pdf