The key to data analysis is data mining, of which the basis is
sequence similarities. The most common approach to similarity
research is DNA sequence alignment which can find the optimal
match between sequences according to similar matrix given, as well
as probable insertion, deletion, and mutation.
3.1.1 Two Stages of DNA
Sequence Analysis
Analyzing nucleic acid sequences with computer programs can be
divided into two stages:
- The first stage is the straightforward search for sequences with
known properties, which involves position determination. - The second stage aims to detect subtle, less straightforward
sequence patterns including controlling elements like promo-
ters. The results can be presented by catalogs of sequence
patterns.
3.1.2 Two Categories of
Computational Approaches
Computational approaches to sequence alignment generally fall
into two categories: global alignments and local alignments.
- Calculating a global alignment is a form of global optimization
that “forces” the alignment to span the entire length of all
query sequences. - Local alignments identify regions of similarity within long
sequences that are often widely divergent overall. Local
Table 2
Several tools for data analysis
Function Name Site
Plot ggplot2 http://docs.ggplot2.org/current/
circos http://circos.ca/
Mapping BWA http://bio-bwa.sourceforge.net/
Bowtie2 http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
SNP/indel calling samtools http://samtools.sourceforge.net/samtools.shtml
gatk http://www.broadinstitute.org/gatk/
pindel http://gmt.genome.wustl.edu/pindel/0.2.4/index.html
Analysis tools plink http://pngu.mgh.harvard.edu/~purcell/plink/index.shtml
ngsTools https://github.com/mfumagalli/ngsTools
Structure analysis frappe http://med.stanford.edu/tanglab/software/frappe.html
structure http://pritchardlab.stanford.edu/structure.html
ngsAdmix http://www.popgen.dk/software/index.php/NgsAdmix
Databases DDBJ http://www.ddbj.nig.ac.jp/index-e.html
ENA http://www.ebi.ac.uk/ena/home
KEGG http://www.genome.jp/kegg/
ensembl http://asia.ensembl.org/index.html
DNA Sequencing Data Analysis 9