Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1
has some drawbacks. The first drawback is that local trimming only
uses the quality information for trimming, but it cannot utilize the
global statistical information to discover the abnormal cycles. The
second drawback is local trimming results in unaligned trimming,
which means duplicated reads may be trimmed differently and
consequently causes some deduplication tools like Picard to fail.
Most of these deduplication tools detect duplications only by clus-
tering reads with identical mapping positions. In contrast, AfterQC
implements a global trimming strategy, i.e., it trims all the reads in
the same manner. An algorithm is used to determine how many
cycles to trim in the front and tail, which is based on the segmenta-
tion of the per-cycle base content curves and base quality curves.
A major advantage of AfterQC is the overlapping analysis. Let
Tdenote the length of a sequenced DNA template, andSdenote
the length of paired-end sequencing length, then the pair of reads
will totally overlap ifTS, will overlap with a length of 2ST,if
S<T< 2 S, and will not overlap if 2ST. AfterQC checks how
does each pair of reads overlap based on edit distance optimization.
For a pair of readsR1 andR2, letObe the offset, and we placeR 2
underR1, then we will have vertically aligned subsequencesR 1 o
andR 2 o, and we can calculate their edit distanceed.(R 1 o,R 2 o).
The method optimizes offsetOto obtain the minimal edit distance,

Table 2
Feature comparison of FastQC, Trimmomatic, Cutadapt, and AfterQC


FastQC Trimmomatic Cutadapt AfterQC
Quality control Rich
functionality

Few functionality Few
functionality

Rich functionality

Auto trimming None Read by read Read by read Global trimming
Cutting adapter None Single-end/pair-
end

Single-end/pair-
end

Paired-end only

PolyX filtering None None None Supported
Figure plotting Static Static Static Interactive
Overlap analysis None Cutting adapter
only

None Supported with error
correction
Sequence error
profiling

None None None Supported

Bubble detection None None None Supported
Programming
language

Java Java Python Python, C

Speed Fast Fast Fast Fast only for single-end
data

Bioinformatics Analysis for Cell-Free Tumor DNA Sequencing Data 77
Free download pdf