Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1

2.2.2 Data Analysis For data analysis, we choose Illumina system as an example. Illu-
mina offers a variety of next-generation sequencing (NGS) data
analysis software tools. Push-button tools for DNA sequence align-
ment, variant calling, and data visualization are all included. Data
generated on Illumina sequencing instruments are automatically
transferred and stored securely in BaseSpace Sequence Hub. And
the analyzing procedure should be done as follows:


Primary Analysis 1. Judge the results’ quality. If the outcome is not in good quality,
the analyzing process will be meaningless.



  1. Searching for your aim fragments.

  2. Real-time analysis and base calling by the Illumina system.


Secondary Analysis 1. After real-time analysis (RTA) in the primary analysis, use
MiSeq Reporter, an online software, to analyze data.



  1. After opening MiSeq Reporter, click “analysis” to see different
    modules including A (assembly), E (enrichment), G (generate
    FASTQ), M (metagenomics), R (Resequencing), etc.

  2. Choose the analyzing module you need and run the procedure.

  3. Read the MiSeq Reporter report. For example, if you choose
    module R, after running the resequencing procedure, the
    detailed report will show a list of samples, a table of targets, a
    list of SNPs and their corresponding scores, Q score, as well as
    the depth of sequencing.

  4. The output is in demultiplex (.demux) and FASTQ (.fastq)
    formats. You can use third-party software programs to further
    analyze the data.

  5. Compare the results with the reference genome.


2.3 Several Tools to
Facilitate Data
Analysis


2.3.1 Artemis R


It is a DNA sequence viewer and annotation tool written in Java.
User can download it for free and run it under systems including
UNIX, GNU/Linux, Macintosh, and Windows.
First, import information from EMBL and GenBank, as well as
files in FASTA format. Then it gives visualization of sequence
features, next-generation data and the results of analyses within
the context of the sequence, and also its six-frame translation.

2.3.2 Arlequin It is an integrated software package for population genetics data
analysis. Arlequin provides methods to analyze patterns of genetic
diversity within and between population samples [6].
The software is freely available onhttp://cmpg.unibe.ch/soft
ware/arlequin3. It can recognize data including DNA sequences,
standard multilocus genotypes, RFLP data, microsatellite data, etc.
It is a powerful software that is capable of many functions including
molecular diversity, mismatch distribution, computation of stan-
dard genetic diversity indices, as well as the estimation of allele and


DNA Sequencing Data Analysis 7
Free download pdf