untitled

(ff) #1

5.4 Structural Databases 111


formats are non-XML text formats.


COG http://www.ncbi.nlm.nih.gov/COG
The database of clusters of orthologous groups of proteins (COGs) attempts
to give a phylogenetic classification of the proteins encoded in 21 complete
genomes of bacteria, archaea, and eukaryotes (Tatusov et al. 2000). The COGs
were constructed by applying the criterion of consistency of genome-specific
best hits to the results of an exhaustive comparison of all protein sequences
from these genomes. The database comprises 2091 COGs that include 56 to
83% of the gene products from each of the complete bacterial and archaeal
genomes and approximately 35% of those from the yeastSaccharomyces cere-
visiaegenome. The database is available as a flat file.


PRINTS umber.sbs.man.ac.uk/dbbrowser/PRINTS
PRINTS is a compendium of protein fingerprints (Attwood et al. 1999, 2003).
It is available in FASTA format.


ProDom http://protein.toulouse.inra.fr/prodom/
current/html/home.php
ProDom is a comprehensive set of protein domain families automatically
generated from the SWISS-PROT and TrEMBL sequence databases (Servant
et al. 2002).


TIGRFAMs http://www.tigr.org/TIGRFAMs/
The Institute for Genomic Research maintains a database of protein families
based on hidden Markov models (Haft et al. 2003). TIGRFAMs currently
contains over 1600 protein families. It includes models for both full-length
proteins and shorter protein regions grouped at the levels of superfamilies,
subfamilies, and “equivalogs,” homologous protein sets that are functionally
conserved since their last common ancestor. TIGRFAMs is a complementary
database to Pfam, whose models typically have a wider coverage across dis-
tant homologs. The data can be downloaded as a text file.


PDB http://www.rcsb.org/pdb
The Protein Data Bank is the largest source of publicly available biomolecu-
lar 3D structures (Bateman et al. 2004). PDB was established at Brookhaven
National Laboratories (BNL) in 1971 as an archive for biological macromolec-
ular crystal structures. According to the PDB holdings list of 9 September
2003, the PDB contains a total of 22,448 structures, 19,062 of which are re-
solved by X-ray, and the remaining 3386 are resolved by Nuclear Magnetic
Resonance (NMR). Generally speaking, NMR structures are more problem-
atic than crystallographic ones, because structures in solution are generally

Free download pdf