untitled

(ff) #1
7.4 BLAST 161

Summary



  • FASTA is a set of sequence similarity search programs.

  • FASTA is also a sequence format, and this is currently the main use for
    FASTA.


7.4 BLAST


The most widely used tool for sequence alignment is BLAST (McGinnis and
Madden 2004), and it plays an important role in genome annotation (Muller
et al. 1999). BLAST uses a heuristic approach to construct alignments based
on optimizing a measure of local similarity (Altschul et al. 1990, 1997). Be-
cause of its heuristic nature, BLAST searches much faster than the main
dynamic programming methods: the Needleman-Wunsch (Needleman and
Wunsch 1970) and Smith-Waterman (Smith and Waterman 1981) algorithms.
In this section we begin by explaining the BLAST algorithm. The algorithm
is then used for a number of types of search, as presented in subsection 7.4.2.
The result of a BLAST search is a collection of matching sequences (or “hits”).
Each hit is given a number of scores that attempt to measure how well the
hit matches the query. These scores are explained in subsection 7.4.3. We end
the section with some variations on the BLAST algorithm.

7.4.1 The BLAST Algorithm


BLAST has become the most popular tool used by biologists. There are two
main versions of BLAST:
NCBI BLAST http://www.ncbi.nlm.nih.gov/blast
This is the version that is most commonly used (Altschul et al. 1990, 1997).

WU-BLAST blast.wustl.edu
Washington University BLAST (Altschul et al. 1990; Gish and States 1993;
States and Gish 1994)
The central idea of the BLAST algorithm is that a statistically significant
alignment is likely to contain a high-scoring matching “word.” BLAST is a
heuristic that attempts to optimize a specific measure of sequence similarity,
based on a “threshold” parameter. In terms of time complexity, the BLAST al-
gorithm requires time proportional to the product of the lengths of the query
sequence and the target database.
Free download pdf