7.4 BLAST 167
Figure 7.3 Comparison of the extreme value distribution with the normal distribu-
tion.
distinct HSPs that would have that score or higher entirely by chance. The
expectation value is writtenEand is approximated by a Poisson distribution
(Karlin and Altschul 1990; Altschul 1991). In terms of the normalized score
S′, the expectation valueEis given bymn 2 −S
′
, wheremis the size of the
query andnis the size of the database. The expectation value is probably the
most useful in the BLAST output. The threshold for significance is usually
set at either 10% or 5%. In other words, whenEis less than 0.1 orEis less
than 0.05, then the HSP is considered to be statistically significant (Altschul
et al. 1997).
Strictly speaking, theE-value is not a probability, so it should not be used
to determine statistical significance. However, it is easy to convertEto a
probability by using the formulaP=1−e−E.TheP-value is the probability
that a search with a random query would produce at least one HSP at the
same score or higher. Table 7.1 shows the relationship betweenEandP. For
E-values below 0.01, there is essentially no difference betweenEandP.The
reason for this is that the Taylor expansion ofexis1+x+x^2 /2! +x^3 /3! +...
so that forxclose to 0, we haveexis approximately equal to1+xand thus,
whenEis close to 0,P=1−e−Eis approximately equal to 1 −(1−E)=E.
The usual way to use BLAST is to find those sequences in a database that
are homologous to a given query sequence. This process compares sequences
in the database with the query sequence, but it does not compare the data-
base sequences with each other. If one wishes to learn about the evolution of