untitled

112 5 Survey of Ontologies in Bioinformatics

more flexible and less stable than those in a crystal. Indeed, solution structures determined by the NMR data are slightly different from crystal structures. Therefore, NMR is often used to study small and peculiar proteins. Protein glycosylation is probably the most common and complex type of co- and post-translational modification encountered in proteins (Lutteke et al. 2004). Inspection of the protein databases reveals that 70% of all proteins have potential N-glycosylation sites - Asn-X-Ser/Thr, where X is not Pro (Mellquist et al. 1998). O-glycosylation is even more ubiquitous (Berman et al. 2000). Consequently, PDB entries contain not only protein structures but also pure carbohydrate structures. However, to date, there is no standard nomenclature for carbohydrate residues within the PDB files (Westbrook and Bourne 2000). For example, although many monosaccharide residues are de- fined in the PDB Het Group Dictionarypdb.rutgers.edu/het_dictio nary.txt, there is no distinction between theα-andtheβ-forms. Thus, it is difficult for glycobiologists to find relevant carbohydrate structures from PDB. The PDB database has two non-XML formats, PDB and mmCIF, that are in use by many other molecular structure databases. Recently an XSD format, PDBML, has been introduced in PDB and automated generation of XML files is driven by the data dictionary infrastructure in use at the PDB. The current XML schema file is located atdeposit.pdb.org/pdbML/pdbx-v1.000. xsd, and on the PDB mmCIF resource page atdeposit.pdb.org/mmcif/. SCOP scop.mrc-lmb.cam.ac.uk/scop The Structural Classification of Proteins database classifies proteins by domains that have a common ancestor based on sequence, structural, and func- tional evidence (Murzin et al. 1995; Andreeva et al. 2004). In order to under- stand how multidomain proteins function, it is important to know how they are created during evolution. Duplication is one of the main sources for cre- ating new genes and new domains (Lynch and Conery 2000). For examples of this, see section 1.5. In fact, 98% of human protein domains are duplicates (Gough et al. 2001; Madera et al. 2004; Muller et al. 2002). Once a domain or protein has duplicated, it can evolve a new or modified function. Access to SCOP requires a license. It is available in a non-XML text format. CATH http://www.biochem.ucl.ac.uk/bsm/cath_new This database contains domain structures classified into superfamilies and sequence families (Orengo et al. 1997, 2003). Its name stands for Class/- Architecture/Topology/Homology. Each structural family is expanded with domain sequence relatives recruited from GenBank using a variety of ef-

untitled

Get our desktop app

Company

Features

Documentation

Resources