untitled

(ff) #1
5.4 Structural Databases 109

contacts, is calculated and stored in the database. Database entries are fur-
ther annotated to include information about the overall structural features,
including conformational classes, special structural features, biological func-
tions, and crystal-packing classifications. The NDB has been used to analyze
characteristics of nucleic acids alone as well as complexed with proteins. The
NDB database is available in the PDB and mmCIF formats.

5.4.2 Protein Structure Databases


Protein structure databases deal with progressively “higher-order” types of
structure: secondary, tertiary, quaternary, and functional. Protein sequence
information is also a form of structure: the primary structure. A protein
structure database will typically have information about structure on several
levels. Accordingly, we have not attempted to perform a strict classification
but rather list them approximately by the type of structure, from primary to
functional.
Structural classifications range from short motifs and domains to entire
protein families, and they derive protein classes based on the molecular sim-
ilarities in terms of secondary or higher-order structures. Functional classifi-
cations range from enzymatic roles to protein interaction networks, and they
derive protein classes based on functional similarities in terms of enzyme
reaction mechanisms, or participation in biochemical pathways.
Pfam http://www.sanger.ac.uk/Software/Pfam
The Protein Family database is a large collection of protein families and do-
mains (Bateman et al. 2004). The Pfam database is available in FASTA format.

SMART smart.embl.de
The Simple Modular Architecture Research Tool is a web tool for the identi-
fication and annotation of protein domains, and provides a platform for the
comparative study of complex domain architectures in genes and proteins.
The January 2004 release of SMART contains 685 protein domains. New de-
velopments in SMART are centered on the integration of data from com-
pleted metazoan genomes. SMART can be queried using GO terms (Letunic
et al. 2004).

PROSITE http://www.expasy.org/prosite
PROSITE is a compilation of sites and patterns found in protein sequences
(Sigrist et al. 2002; Hulo et al. 2004). The use of protein sequence patterns
(motifs) to determine the protein function has become one of the essential
Free download pdf