158 7 Sequence Similarity Searching Tools
10 log 10 units, which is roughly the same as third-bit units. BLOSUM matri-
ces are usually scaled in half-bit units. In either type of scoring matrix, if the
score is 0, then the alignment of the amino acid pair is equivalent to being
coincidental; if the score is positive, the alignment of the amino acid pair is
found to be more often than by chance; if the score is negative, the align-
ment of the amino acid pair is found to be even less often than by chance.
The NCBI BLAST tool allows one to choose from a variety of scoring matri-
ces, including PAM30, PAM70, BLOSUM45, BLOSUM62, and BLOSUM80. A
more complete roster of scoring matrices (PAM10-PAM500, and BLOSUM30-
BLOSUM100) is available at the following ftp site:ftp://ftp.ncbi.nlm.
nih.gov/blast/matrices.
Mutational events include not only substitutions but also insertions and
deletions. Consequently one must also consider the possibility of alignment
gaps. However, gaps are a form of sequence mismatch, so they affect the
score negatively. During the process of alignment, the initiation of a new
gap adds a penalty called anopeninggap penalty, while the widening of
an existing gap adds anextensiongap penalty. For amino acid sequences,
it is common to set extension gap penalties to be lower than opening gap
penalties because certain protein domains evolve as a unit, rather than as
single residues.
Summary
- Sequence similarity search is a process whereby a query sequence is com-
pared with sequences in a database to find the best matches. - The score depends on the scoring matrix and the gap penalties.
- The scoring matrix can be position-independent (substitution matrix) or
position-sensitive. - The most commonly used substitution matrices are PAM and BLOSUM.
7.2 Dynamic Programming Algorithm
The first algorithm that was used for sequence matching was a dynamic
programming algorithm, called the Needleman-Wunsch algorithm (Needle-
man and Wunsch 1970). A dynamic programming algorithm finds an op-
timal solution by breaking the original composite problem recursively into