106 5 Survey of Ontologies in Bioinformatics
MML
The Medical Markup Language provides the XML-based standard for medi-
cal data exchange/storage (Guo et al. 2003).
MotifML motifml.org
MotifML is a language for representing the computationally predicted DNA
motifs (often in the regulatory region such as promoters) generated by the
Gibbs motif sampler, AlignACE, BioProspector, and CONSENSUS. MotifML
was created by the authors of this book and two collaborators (Sui Huang
and Jerzy Letkowski). MotifML uses the Web Ontology Language (OWL) to
specify the data structure of a MotifML document. MotifML is supported by
Java-based visualization tools such as MotifML viewers.
NeuroML http://www.neuroml.org/main.html
The Neural Open Markup Language is an XML language for describing mod-
els, methods, and literature for neuroscience. NeuroML uses XSD to specify
the syntactic requirements for the model descriptions (Goddard et al. 2001).
ProML
The Protein Markup Language is for specifying protein sequences, struc-
tures, and families using an open XML standard. ProML allows machine-
readable representations of key protein features (Hanisch et al. 2002).
TML
Taxonomic Markup Language is mainly an XML format for representing the
topology of a phylogeny, but also includes a representation for statistical
metadata (e.g., branch length, retention index, and consistency index) de-
scribing the phylogeny (Gilmour 2000). It is notable that for TML, the hier-
archical nature of a phylogeny is readily represented by XML.
5.3 Macromolecular Sequence Databases
The rapid expansion of nucleotide sequence data available in public data-
bases is revolutionizing biomedical research. Sequence databases such as
GenBank have a variety of uses, including the discovery of novel genes,
identification of homologous genes, analysis of alternative splicing, chromo-
somal localization of genes, and detection of polymorphisms (Pandey and
Lewitter 1999). Macromolecular sequence databases are classified according
to whether they deal with nucleotide sequences or protein sequences.