untitled

7.3 FASTA 159

smaller and smaller problems until the smallest problems have trivial solutions. The smaller solutions are then used to construct the solutions for the larger and larger parts of the original problem until the original problem has been solved. In this case, the composite problem is to determine the optimal alignment of the two sequences at their full lengths. This alignment problem is split by breaking down the two sequences into smaller segments. The splitting continues recursively until the subproblem consists of comparing two residues. At this point the score is obtained from the scoring matrix. The resulting alignment is guaranteed to be globally optimal. Smith and Water- man (1981) modified the Needleman-Wunsch algorithm to make it run faster but only guaranteeing that the alignment is locally optimal. Although the exact dynamic programming algorithm are guaranteed to find the optimal match (either global or local), they can be very slow. This is especially true for a full search of the very large sequence databases such as GenBank for nucleotide sequences and SWISS-PROT for amino acid sequences that are commonly used today. To deal with this problem, a number of heuristic techniques have been introduced, such as FASTA and BLAST, that give up the guarantee of optimality for the sake of improved speed. In practice, the effect on optimality is small, so the improvement in performance is worth the compromise. These new algorithms search for the best local alignment rather than the best global alignment.

Summary

The earliest sequence similarity searching algorithms applied exact dy-
namic programming either globally or locally.

Current algorithms are heuristic methods that still use dynamic program-
ming but apply approximations to improve performance.

7.3 FASTA

FASTA is both a collection of programs and a widely used format (Pear- son and Lipman 1988). The programs are available fromwww.ebi.ac.uk/ fasta33. This site is also a web service for performing FASTA sequence searching. The FASTA algorithm was the first widely used algorithm for amino acid and nucleotide similarity searches. The first step of the FASTA algorithm is to find exactly matching “words” of lengthktup. The default value of

untitled

7.3 FASTA 159

Summary

7.3 FASTA

Get our desktop app

Company

Features

Documentation

Resources