Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1
fragments, and to compare the sequence with a reference genome.
Also, biologists pay attention to other characteristics of the
sequence that might determine its biological features. That is why
the work of data analysis should be done for further study.

2.1 General Steps of
DNA Sequencing Data
Analysis


Generally, DNA sequencing data analysis includes these four steps:
l Trimming of overlapping sequences.
l Multiple alignments of template sequences.
l Consistency check between reading text and chromatogram
peak data.
l Review and correction of software misreads.

To be more precise, by using DNA sequencing technology,
especially the Sanger sequencing, we obtain data in the form of
chromatogram—a series of four differently colored peaks. Usually,
after opening the result file in a software such as Chromas Lite,
there shows red, black, green, and blue peaks, each color
corresponding to a different DNA base. On both ends of the
chromatogram, there exist about 50 bases that are difficult to
recognize. This is because of impurities and is a normal
phenomenon.
When screening the chromatogram, we are likely to find two
overlapping peaks. It seems that this spot represents a heterozygos-
ity locus. However, things get more complicated when the two
overlapping peaks have different axes or when the two peaks share
one axis but are of the same height. This spot is not a heterozygos-
ity locus since one peak is the interference peak. Mostly, one or two
spots before a big base peak exists an interference peak whose
height is approximately half of the big peak. The closer they are,
the more interference they have. And under these circumstances,
the computer often makes mistakes; that is where humans step in
and correct those misreads.
When checking the outcome of the software, we conclude
some rules to help us determine whether the results are accurate
after tons of work:


  1. The main peak mostly sits on the right side of the
    interference peak.

  2. The interference peak can be higher or lower or of the same
    height than the main peak.


As a result, in order to reduce misreads, we often do several
procedures:


  1. Consistency check among reading text and results in gene pool
    and chromatogram peak data must be done.


DNA Sequencing Data Analysis 5
Free download pdf