7.4 BLAST 163
is detected if it contains two nonoverlapping words of lengthWwhose
scores are at least the thresholdT, with starting positions that differ by
no more thanAresidues. The gapped BLAST algorithm uses the two-hit
method (Altschul et al. 1997). The two hits are extended in both directions
by means of dynamic programming. In the two-hit method, a smaller
thresholdTcan be used because the requirement that two hits occur near
each other limits the number of hits that qualify. As the name suggests,
the gapped BLAST algorithm can introduce gaps to matching alignments.
- Evaluation step. BLAST determines the statistical significance of each of
the HSPs obtained in the word extension step and gives a report on the
HSPs that have been found. This report is discussed in more detail in
subsection 7.4.3 below.
Sometimes two or more segment pairs can be merged into a single,
longer segment. In such cases, a joint assessment of the statistical sig-
nificance can be made using the Poisson method or the sum-of-scores
method. The earliest BLAST versions used the Poisson method, while
more recent BLAST versions (including WU-BLAST and gapped BLAST)
use the sum-of-scores method.
FASTA differs from BLAST primarily in that FASTA strives to get exact
“word” matches, whereas BLAST uses a scoring matrix (such as the de-
fault BLOSUM62 for amino acid sequences) to search for words that may
not match exactly, but are high-scoring nevertheless. FASTA does not have a
preprocessing step as in BLAST, and FASTA does not use the BLAST strategy
of extending seeds using sophisticated dynamic programming. Both FASTA
and BLAST have a word generation step which does not allow gaps, fol-
lowed by a Smith-Waterman alignment step that can introduce gaps.
Summary
- BLAST uses a heuristic approach to find alignments quickly.
- The BLAST algorithm consists of these steps:
- Preprocessing: omit uninformative regions of the query.
- Word generation: generate small seed matches.
- Word extension: extend single seeds or pairs of seeds.
- Evaluation: compute measures of significance.