7.4 BLAST 165
Figure 7.2 Illustration of blastx (version 2.2.10) output using a 718-bp DNA se-
quence (GenBank accession number AF200505.1) encoding exon 4 ofPongo pygmaeus
ApoE gene.
This is necessary if one must process a large number of queries. Managing
such a collection of queries becomes a problem in itself, and software has
been developed for this task. For example,BeoBLASTis a Perl program that
distributes individual BLAST jobs across the nodes of a cluster (Grant et al.
2002). AlthoughBeoBLASTwas designed for use on Linux Beowulf clusters,
it can be used on any collection of computers that satisfy a few basic require-
ments such as having a BLAST program, a web server, and the GNU queue
service. For more information about how one can configure a computer for
BeoBLAST, download it frombioinformatics.fccc.edu/software/
OpenSource/beoblast/beoblast.shtmland read the installation in-
structions.
As a general rule, if a query is an amino acid sequence, then it is bet-
ter to search against an amino acid sequence database rather than against
a nucleotide sequence database. There are several reasons for this. First,
the genetic codes are degenerate (i.e., several different genetic codes encode
the same amino acid). Direct amino acid sequence alignment eliminates the
noise that results from the degeneracy. Second, amino acid databases tend to
be more sparsely populated than nucleotide sequences because constraints
during protein evolution are more severe than during DNA evolution. Un-