Computational Systems Biology Methods and Protocols.7z

be interpreted and reported. These tools can be arranged into a pipeline. Figure3 demonstrates a ctDNA sequencing data analysis pipeline regularly used by the authors. For Illumina platforms, the tool bcl2fastq is used to convert BCL format files to FASTQ files. Illumina platforms support multiplexing by using different barcodes for different samples, so de-multiplexing is performed along with the conversion. Some additional tools can perform quality control and data filtering over FASTQ files, e.g., FastQC and Trimmomatic [10]. The authors suggest using AfterQC [11], which is highly optimized for ctDNA sequencing data processing. AfterQC will be introduced in the next section. A lot of aligners can be used to map DNA sequencing reads to reference genome, such as bowtie2 [12] and BWA [13]. According to our practice, BWA provides a better performance both in align- ment quality and speed. BWA is a software package for mapping low-divergent sequences against a large reference genome. It con- sists of three algorithms: BWA-backtrack, BWA-SW, and

Fig. 3A typical pipeline for analyzing ctDNA sequencing data. The raw data will be de-multiplexed to separate
FASTQ files by sample indexes and then be filtered to remove bad reads. The FASTQ reads will be aligned to
reference genome to generate BAM files. Variant callers will scan the sorted and processed BAM files to
generate original VCF files, which can then be annotated with databases and baseline data. After a complete
VCF is generated, it can be filtered to create clean variants or be visualized for variant validation

72 Shifu Chen et al.

Computational Systems Biology Methods and Protocols.7z

Get our desktop app

Company

Features

Documentation

Resources