untitled

110 5 Survey of Ontologies in Bioinformatics

tools in sequence analysis. PROSITE was developed in 1988 to systemati- cally collect macromolecularly significant patterns (Bairoch 1991). PROSITE is based on multiple sequence alignments (MSAs) which use two kinds of descriptor: patterns and generalized profiles (Hulo et al. 2004). In PROSITE, each PROSITE signature is linked to an annotation document where the user can obtain information regarding the signature. In order to make the three- dimensional (3D) structure more comprehensible, there are links to the rep- resentative PDB database. PROSITE is closely related to the SWISS-PROT protein sequence data bank. The PROSITE descriptors and documentation can also be accessed through InterPro, which uses the detailed family annotation provided by PRINTS (Attwood et al. 2003). InterPro (Mulder et al. 2003) provides an integrated view of several domain databases and offers a large choice of methods to identify conserved regions. ClustalW (Thompson et al. 1994) or T-Coffee (Notredame et al. 2000) are most commonly used to construct the MSAs. However, when the primary sequences are too divergent, it is useful to inte- grate structural information in the MSAs. In addition, about 3% of profiles in PROSITE are built by using the HMMER hidden Markov model package (Eddy 1998). The PROSITE database is available as a text file. The format is defined in a separate file and uses a variety of characters (forward slashes, commas, semicolons, etc.) as delimiters.

BLOCKS blocks.fhcrc.org Blocks are defined as ungapped multiple alignments corresponding to the most conserved regions of proteins. Blocks contain “multiple alignment” information, and the use of the BLOCKS database can improve the detection of sequence similarities in searches of sequence databases. The BLOCKS database was introduced to aid in the family classification of proteins (Henikoff and Henikoff 1991). This database turns out to be a very important database, because hits to BLOCKS database entries pinpoint the location of conserved motifs, which are important for further functional characterization (Henikoff et al. 2000). Furthermore, the BLOCKS database can be used for detecting distant relationships (Henikoff et al. 1998). The BLOCKS database is the ba- sis for the BLOSUM substitution tables that are used in amino acid sequence similarity searching, as explained in section 7.1. The BLOCKS database contains more than 24,294 blocks from nearly 5000 different protein groups (Henikoff et al. 2000). There are a variety of formats for blocks, including the Blocks, FASTA, and Clustal formats. All of the

untitled

Get our desktop app

Company

Features

Documentation

Resources