108 5 Survey of Ontologies in Bioinformatics
et al. 2004). It is available in several formats, including FASTA and XML.
The XML format is defined by the DTD atftp://ftp.ddbj.nig.ac.jp/
database/ddbj/xml/DDBJXML.dtd. DDBJ cooperates with both EMBL
and GenBank.
5.3.2 Protein Sequence Databases
SWISS-PROT au.expasy.org/sprot
SWISS-PROT is the most widely used publicly available protein sequence
database. This database aims to be nonredundant, fully annotated, and highly
cross-referenced (Jung et al. 2001). SWISS-PROT also includes information
on many types of protein modifications. The database is available in both
FASTA and XML formats. The XML format is defined both as a DTD and
using XSD. The XSD schema is atwww.uniprot.org/support/docs/
uniprot.xsd. The database itself is available atftp://ftp.ebi.ac.uk/
pub/databases/uniprot/knowledgebase/uniprot_sprot.xml.gz.
Both SWISS-PROT and TrEMBL are available at this site in a variety of for-
mats.
5.4 Structural Databases
Like sequence databases, the structural databases are classified according to
whether they deal with nucleotide structure or protein structure.
5.4.1 Nucleotide Structure Databases
NDB ndbserver.rutgers.edu
The most prominent nucleotide structure database is the Nucleic Acid Data-
base. NDB was established in 1991 as a resource to assemble and distribute
structural information about nucleic acids (both DNA and RNA) (Berman
et al. 1992). The core of the NDB has been its relational database of nucleic
acid-containing crystal structures. The primary data include the crystallo-
graphic coordinate data, structure factors, and information about the exper-
iments used to determine the structures, such as crystallization information,
data collection, and refinement statistics. Derived information from experi-
mental data, including valency geometry, torsion angles, and intermolecular