untitled

(ff) #1
7.4 BLAST 171

that have been conserved by evolution (Kelley et al. 2004). The basic method
searches for high-scoring alignments between pairs of protein interaction
paths, for which proteins of the first path are paired with putative orthologs
occurring in the same order in the second path.


BLAT genome.ucsc.edu/cgi-bin/hgBlat
The BLAST-Like Alignment Tool is a very fast DNA/amino acid sequence
alignment tool written by Jim Kent at the University of California, Santa Cruz
(Kent 2002). It is designed to quickly find sequences of 95% and greater sim-
ilarity of length 40 bases or more. It will find perfect sequence matches of 33
bases, and sometimes find them down to 22 bases. BLAT on proteins finds
sequences of 80% and greater similarity of length 20 amino acids or more. In
practice DNA BLAT works well on primates, and protein BLAT on land ver-
tebrates. It is noted that BLAT may miss more divergent or shorter sequence
alignments.
BLAT is similar in many ways to BLAST. The program rapidly scans for
relatively short matches (hits), and extends these into HSPs. However, BLAT
differs from BLAST in some significant ways. For instance, where BLAST re-
turns each area of homology between two sequences as separate alignments,
BLAT stitches them together into a larger alignment. BLAT has a special code
to handle introns in RNA/DNA alignments. Therefore, whereas BLAST de-
livers a list of exons sorted by exon size, with alignments extending slightly
beyond the edge of each exon, BLAT effectively “unsplices” mRNA onto the
genome giving a single alignment that uses each base of the mRNA only
once, and which correctly positions splice sites. BLAT is more accurate and
500 times faster than popular existing tools for mRNA/DNA alignments and
50 times faster for amino acid alignments at sensitivity settings typically used
when comparing vertebrate sequences.
BLAT’s speed stems from an index of all nonoverlapping sequences of
fixed length in the sequence database. DNA BLAT maintains an index of
all nonoverlapping sequences of length 11 in the genome, except for those
heavily involved in repeats. The index takes up a bit less than a gigabyte
of RAM. The genome itself is not kept in memory, allowing BLAT to deliver
high performance on a reasonably priced computer. The index is used to
find areas of probable homology, which are then loaded into memory for a
detailed alignment analysis. Protein BLAT works in a similar manner, ex-
cept with sequences of length 4. The protein index takes a little more than 2
gigabytes.
BLAT has several major stages. It uses the index to find regions in the

Free download pdf