Computational Systems Biology Methods and Protocols.7z

ed.(R 1 o1,R 2 o1)>ed.(R 1 o,R 2 o)<ed.(R 1 o+1,R 2 o+ 1). Figure4 shows an example of how AfterQC’s overlapping analysis works. Based on overlapping analysis, AfterQC can detect mismatches. If the mismatched pair has unbalanced quality scores, which means one base has high-quality score (i.e.,>Q30) and the other has very low-quality score (i.e.,<Q15), AfterQC can automatically correct the base with low quality. If the quality scores are not unbalanced, AfterQC can mask them by changing the bases to N or assigning zero quality scores to them. Based on the mismatches, AfterQC can evaluate the sequencing error rate and profile the sequencing error transform distribution (i.e., how many bases are T but sequenced as C). Overlapping analysis can be used for automatic adapter cutting. In the overlapping analysis process, we get the optimal offsetOfor the best local alignment of each pair. The overlapping length of this pair can be directly calculated using the offsetO.IfOis found negative, the bases outside overlapping region will be considered as a part of adapter sequences and then be cut automatically. AfterQC is an open source tool: https://github.com/ OpenGene/AfterQC. It is implemented in Python and C++, with PyPy support enabled. AfterQC generates a standalone HTML report for each input, with figures plot by Plotly. A sample report can be found at:http://opengene.org/AfterQC/report.html.

2.2 Molecular
Barcoding Sequencing
and Its Data Analysis

The potential of NGS deep sequencing for ctDNA was hampered by systemic errors introduced by PCR and sequencing methods [27, 28]. Molecular indexing combined with deep sequencing holds great promise to break the limit imposed by PCR and sequencing errors and enables the detection of rare and ultra-rare mutations [29, 30]. Tagging individual templates with molecular barcodes has been proposed and reported since 2007 [31]. The molecular barcodes or molecular indexes have been given various names, such as unique identifiers (UID) [29], unique molecular identifiers (UMI) [32], primer ID [30], duplex barcodes [33], etc. They are usually designed as a string of totally random nucleotides (such as NNNNNNNN), partially degenerate nucleotides (such as

Fig. 4How AfterQC’s overlapping analysis works. The edit distance of the overlapped subsequences is 1. A
mismatch pair is found with a high-quality baseAand a very low-quality baseT. ThisTwill be recognized as
wrongly represented and can be corrected

78 Shifu Chen et al.

Computational Systems Biology Methods and Protocols.7z

Get our desktop app

Company

Features

Documentation

Resources