Computational Systems Biology Methods and Protocols.7z

fragments, and to compare the sequence with a reference genome. Also, biologists pay attention to other characteristics of the sequence that might determine its biological features. That is why the work of data analysis should be done for further study.

2.1 General Steps of
DNA Sequencing Data
Analysis

Generally, DNA sequencing data analysis includes these four steps: l Trimming of overlapping sequences. l Multiple alignments of template sequences. l Consistency check between reading text and chromatogram peak data. l Review and correction of software misreads.

To be more precise, by using DNA sequencing technology, especially the Sanger sequencing, we obtain data in the form of chromatogram—a series of four differently colored peaks. Usually, after opening the result file in a software such as Chromas Lite, there shows red, black, green, and blue peaks, each color corresponding to a different DNA base. On both ends of the chromatogram, there exist about 50 bases that are difficult to recognize. This is because of impurities and is a normal phenomenon. When screening the chromatogram, we are likely to find two overlapping peaks. It seems that this spot represents a heterozygos- ity locus. However, things get more complicated when the two overlapping peaks have different axes or when the two peaks share one axis but are of the same height. This spot is not a heterozygos- ity locus since one peak is the interference peak. Mostly, one or two spots before a big base peak exists an interference peak whose height is approximately half of the big peak. The closer they are, the more interference they have. And under these circumstances, the computer often makes mistakes; that is where humans step in and correct those misreads. When checking the outcome of the software, we conclude some rules to help us determine whether the results are accurate after tons of work:

The main peak mostly sits on the right side of the
interference peak.

The interference peak can be higher or lower or of the same
height than the main peak.

As a result, in order to reduce misreads, we often do several procedures:

Consistency check among reading text and results in gene pool
and chromatogram peak data must be done.

DNA Sequencing Data Analysis 5

Computational Systems Biology Methods and Protocols.7z

Get our desktop app

Company

Features

Documentation

Resources