untitled

166 7 Sequence Similarity Searching Tools

like DNA, a protein must fold into a functionally competent 3D structure.

Summary

In addition to BLAST searches for nucleotide and amino acid sequences,
there are search types that take into account the translation from nucleotide
to amino acid.

There are publicly available BLAST web services for searches done with
one sequence at a time.

Clusters of computers are frequently used for performing large batches of
BLAST searches.

7.4.3 Scores and Values

The output of a BLAST search consists of a set of HSPs annotated with vari- ous measures of their statistical significance. The score of each HSP is usually denoted bySand is called theraw score. The raw score depends on the var- ious customization parameters of the search such as the scoring matrix. The normalized scoreadjusts the raw score so that alignment scores from differ- ent searches can be compared (Altschul et al. 1997). The normalized score is S′=(λS−lnK)/ln2, whereλandKare the Karlin-Altschul statistics (Karlin and Altschul 1990, 1993). The reason why one divides byln2is so that the units of the normalized score are inbits, a term borrowed from information theory (Altschul 1991). As a result,S′is also called thebit score. The HSP with the largest score is called themaximal-scoring segment pair (MSP). Because the MSP is the best match of the query, it is the most impor- tant. One should be careful when using MSP scores from multiple queries. Since the MSP score is a maximum, its probability distribution is given by theextreme value distribution, also known as theFisher-Tippettorlog-Weibull distribution. This distribution is not the same as a normal distribution even when scores in general are normally distributed. This distribution is shown in figure 7.3 where it is compared with the normal distribution. Sequence similarity searches are commonly used to determine the functionality of a sequence by comparing it with sequences whose functionality is known. Inferring functionality is reasonable only when the similarity is statistically significant. To determine statistical significance one compares the actual search result with what would be expected for a search using a random query sequence. Theexpectation valuefor a score is the number of

untitled

7.4.3 Scores and Values

Get our desktop app

Company

Features

Documentation

Resources