untitled

(ff) #1
7.3 FASTA 159

smaller and smaller problems until the smallest problems have trivial solu-
tions. The smaller solutions are then used to construct the solutions for the
larger and larger parts of the original problem until the original problem has
been solved. In this case, the composite problem is to determine the optimal
alignment of the two sequences at their full lengths. This alignment prob-
lem is split by breaking down the two sequences into smaller segments. The
splitting continues recursively until the subproblem consists of comparing
two residues. At this point the score is obtained from the scoring matrix. The
resulting alignment is guaranteed to be globally optimal. Smith and Water-
man (1981) modified the Needleman-Wunsch algorithm to make it run faster
but only guaranteeing that the alignment is locally optimal.
Although the exact dynamic programming algorithm are guaranteed to
find the optimal match (either global or local), they can be very slow. This
is especially true for a full search of the very large sequence databases such
as GenBank for nucleotide sequences and SWISS-PROT for amino acid se-
quences that are commonly used today. To deal with this problem, a number
of heuristic techniques have been introduced, such as FASTA and BLAST,
that give up the guarantee of optimality for the sake of improved speed.
In practice, the effect on optimality is small, so the improvement in perfor-
mance is worth the compromise. These new algorithms search for the best
local alignment rather than the best global alignment.

Summary



  • The earliest sequence similarity searching algorithms applied exact dy-
    namic programming either globally or locally.

  • Current algorithms are heuristic methods that still use dynamic program-
    ming but apply approximations to improve performance.


7.3 FASTA


FASTA is both a collection of programs and a widely used format (Pear-
son and Lipman 1988). The programs are available fromwww.ebi.ac.uk/
fasta33. This site is also a web service for performing FASTA sequence
searching.
The FASTA algorithm was the first widely used algorithm for amino acid
and nucleotide similarity searches. The first step of the FASTA algorithm
is to find exactly matching “words” of lengthktup. The default value of
Free download pdf