Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1
The key to data analysis is data mining, of which the basis is
sequence similarities. The most common approach to similarity
research is DNA sequence alignment which can find the optimal
match between sequences according to similar matrix given, as well
as probable insertion, deletion, and mutation.

3.1.1 Two Stages of DNA
Sequence Analysis


Analyzing nucleic acid sequences with computer programs can be
divided into two stages:


  1. The first stage is the straightforward search for sequences with
    known properties, which involves position determination.

  2. The second stage aims to detect subtle, less straightforward
    sequence patterns including controlling elements like promo-
    ters. The results can be presented by catalogs of sequence
    patterns.


3.1.2 Two Categories of
Computational Approaches


Computational approaches to sequence alignment generally fall
into two categories: global alignments and local alignments.


  1. Calculating a global alignment is a form of global optimization
    that “forces” the alignment to span the entire length of all
    query sequences.

  2. Local alignments identify regions of similarity within long
    sequences that are often widely divergent overall. Local


Table 2
Several tools for data analysis


Function Name Site
Plot ggplot2 http://docs.ggplot2.org/current/
circos http://circos.ca/
Mapping BWA http://bio-bwa.sourceforge.net/
Bowtie2 http://bowtie-bio.sourceforge.net/bowtie2/index.shtml
SNP/indel calling samtools http://samtools.sourceforge.net/samtools.shtml
gatk http://www.broadinstitute.org/gatk/
pindel http://gmt.genome.wustl.edu/pindel/0.2.4/index.html
Analysis tools plink http://pngu.mgh.harvard.edu/~purcell/plink/index.shtml
ngsTools https://github.com/mfumagalli/ngsTools
Structure analysis frappe http://med.stanford.edu/tanglab/software/frappe.html
structure http://pritchardlab.stanford.edu/structure.html
ngsAdmix http://www.popgen.dk/software/index.php/NgsAdmix
Databases DDBJ http://www.ddbj.nig.ac.jp/index-e.html
ENA http://www.ebi.ac.uk/ena/home
KEGG http://www.genome.jp/kegg/
ensembl http://asia.ensembl.org/index.html

DNA Sequencing Data Analysis 9
Free download pdf