7.4 BLAST 169
Another difference is that the result of a PSSM search can be expressed in
terms of the probability that a type of residue occurs in each position. In
other words, the output of a PSSM search is another PSSM. PSI-BLAST, the
position-specific iterated BLAST, algorithm takes advantage of these features
of PSSM searches to improve sensitivity by iterating the BLAST algorithm. In
other words, the output of a PSSM search is expressed as another PSSM and
used for another PSSM search. This process is then repeated. PSI-BLAST is
often much better at detecting relatively weak relationships than noniterated
sequence similarity queries (Taylor 1986; Dodd and Egan 1990). Another
advantage of PSI-BLAST is that motif boundaries can be more precisely de-
fined. Ordinary BLAST relies on cumbersome extension and trimming pro-
cesses to determine the optimal boundary.
The first step in the PSI-BLAST algorithm is to find all database segments
that match the query sequence with anE-value below a user-defined thresh-
old (say 0.01). The matching database segments are then organized as an
MSA. The next step following the construction of the MSA is to construct
a PSSM. Closely related sequences in the MSA are given relatively smaller
weights to avoid biasing the probability distributions. The BLAST algorithm
is then applied with this PSSM, and the whole process is iterated a large
number of times.
One disadvantage of PSI-BLAST is that false positives (with a lowE-value)
could kick in and cause corrupted PSSMs that eventually lead to spurious
results in subsequent iterations. To deal with this problem a modified ver-
sion of PSI-BLAST has been developed that incorporates composition-based
statistics (Schaffer et al. 2001). This technique significantly improves the ac-
curacy of PSI-BLAST by suppressing the corruption of constructed PSSMs.
PHI-BLAST bioinfo.bgu.ac.il/blast/psiblast_cs.html
PHI-BLAST, the pattern-hit initiated BLAST program, is a hybrid strategy
that addresses a question frequently asked by researchers; namely, whether
a particular pattern seen in a protein of interest is likely to be functionally
relevant or occurs simply by chance (Zhang et al. 1998). This question is
addressed by combining a pattern search with a search for statistically sig-
nificant sequence similarity. The input to PHI-BLAST consists of an amino
acid or DNA sequence, along with a specificpatternoccurring at least once
within the sequence. The pattern consists of a sequence of residues or sets of
residues, with “wild cards” and variable spacing allowed. PHI-BLAST helps
to ascertain the biological relevance of patterns detected within sequences,
and in some cases to detect subtle similarities that escape a regular BLAST