Computational Systems Biology Methods and Protocols.7z

When finding a possible spot, compare it with multiple
samples.

Calculate the mutation rate of your finding, and compare it
with data in authoritative publications or databases.

2.2 Procedure for
NGS Data Analysis

2.2.1 Quality Control

When it comes to analyzing the results of next-generation DNA sequencing (NGS) data, the situation is more complicated. This is because the results are determined by varied DNA library con- structing process and adaptors-adding process. Since the modern high-throughput sequencers can generate hundreds of millions of sequences in a single run, before analyzing this sequence to draw biological conclusions, we are prone to perform some simple quality control checks to ensure that the raw data looks good and there are no problems or biases in the data. Although many sequencers will generate a QC report, this is usually not enough since it only focused on identifying problems which were generated by the sequencer itself. FastQC is a widely used software that aims to provide a more detailed QC report, which can spot problems which originate either in the sequencer or in the starting library material. When using FastQC, we should know the following steps:

Use the Linux system and install FastQC:
(http://www.bioinformatics.babraham.ac.uk/projects/fastqc/).

Type in command “fastqc [-o output dir] [--(no)extract] [-f
fastq|bam|sam] [-c contaminant file].” “output dir” means the
output path, the parameter “extract” determines the output
unpacking, and the parameter “-f” represents the format of
input.

Run FastQC and read the result files:
l The HTML report shows a summary of the modules which
were run and a quick evaluation of whether the results of the
module seem entirely normal (green tick), slightly abnormal
(orange triangle), or very unusual (red cross).
l View the per base sequence quality. Quality can be seen as
the value of Fred. In “ 10 log10(p),” “p” stands for the
possibility of a mistake. Values of the lower quartile and the
median should be considered. If the value of the lower
quartile exceeds 30, the quality can be regarded as
very good.
l View the per sequence quality scores. Normally, if 90% of
the reads have the quality value of more than 35 scores, the
quality can be regarded as very good.
l View the distribution of A,T,G,C. In most cases, the
amount of A/T (28%) outweighs that of G/C (22%).

6 Keyi Long et al.

Computational Systems Biology Methods and Protocols.7z

Get our desktop app

Company

Features

Documentation

Resources