Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1
be interpreted and reported. These tools can be arranged into a
pipeline. Figure3 demonstrates a ctDNA sequencing data analysis
pipeline regularly used by the authors.
For Illumina platforms, the tool bcl2fastq is used to convert
BCL format files to FASTQ files. Illumina platforms support multi-
plexing by using different barcodes for different samples, so
de-multiplexing is performed along with the conversion.
Some additional tools can perform quality control and data
filtering over FASTQ files, e.g., FastQC and Trimmomatic
[10]. The authors suggest using AfterQC [11], which is highly
optimized for ctDNA sequencing data processing. AfterQC will
be introduced in the next section.
A lot of aligners can be used to map DNA sequencing reads to
reference genome, such as bowtie2 [12] and BWA [13]. According
to our practice, BWA provides a better performance both in align-
ment quality and speed. BWA is a software package for mapping
low-divergent sequences against a large reference genome. It con-
sists of three algorithms: BWA-backtrack, BWA-SW, and

Fig. 3A typical pipeline for analyzing ctDNA sequencing data. The raw data will be de-multiplexed to separate
FASTQ files by sample indexes and then be filtered to remove bad reads. The FASTQ reads will be aligned to
reference genome to generate BAM files. Variant callers will scan the sorted and processed BAM files to
generate original VCF files, which can then be annotated with databases and baseline data. After a complete
VCF is generated, it can be filtered to create clean variants or be visualized for variant validation


72 Shifu Chen et al.

Free download pdf