Catalyzing Inquiry at the Interface of Computing and Biology

(nextflipdebug5) #1
68 CATALYZING INQUIRY

A feature of ontologies that facilitates the integration of databases is the use of a hierarchical
structure that is progressively specialized; that is, specific terms are defined as specialized forms of
general terms. Two different databases might not extend their annotation of a biological object to the
same level of specificity, but the databases can be integrated by finding the levels within the hierarchy
that share a common term.
The naming dimension of ontologies has been common to research in the life sciences for much of
its history, although the term itself has not been widely used. Chung and Wooley note the following, for
example:



  • The Linnaean system for naming of species and organisms in taxonomy is one of the oldest
    ontologies.

  • The nomenclature committee for the International Union of Pure and Applied Chemistry (IUPAC)
    and the International Union of Biochemistry and Molecular Biology (IUBMB) make recommendations
    on organic, biochemical, and molecular biology nomenclature, symbols, and terminology.

  • The National Library of Medicine Medical Subject Headings (MeSH) provides the most compre-
    hensive controlled vocabularies for biomedical literature and clinical records.
    •A division of the College of American Pathologists oversees the development and maintenance
    of a comprehensive and controlled terminology for medicine and clinical information known as
    SNOMED (Systematized Nomenclature of Medicine).

  • The Gene Ontology Consortium^25 seeks to create an ontology to unify work across many ge-
    nomic projects—to develop controlled vocabulary and relationships for gene sequences, anatomy, physi-
    cal characteristics, and pathology across the mouse, yeast, and fly genomes.^26 The consortium’s initial
    efforts focus on ontologies for molecular function, biological process, and cellular components of gene
    products across organisms and are intended to overcome the problems associated with inconsistent
    terminology and descriptions for the same biological phenomena and relationships.


Perhaps the most negative aspect of ontologies is that they are in essence standards, and hence take
a long time to develop—and as the size of the relevant community affected by the ontology increases, so
does development time. For example, the ecological and biodiversity communities have made substan-
tial progress in metadata standards, common taxonomy, and structural vocabulary with the help of
National Science Foundation and other government agencies.^27 By contrast, the molecular biology
community is much more diverse, and reaching a community-wide consensus has been much harder.
An alternative to seeking community-wide consensus is to seek consensus in smaller subcommuni-
ties associated with specific areas of research such as sequence analysis, gene expression, protein path-
ways, and so on.^28 These efforts usually adopt a use-case and open-source approach for community
input. The ontologies are not meant to be mandatory, but instead to serve as a reference framework
from which further development can proceed.


(^25) See http://www.geneontology.org.
(^26) M. Ashburner, C.A. Ball, J.A. Blacke, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, et al., “Gene Ontology: Tool for the
Unification of Biology,” Nature Genetics 25(1):25–29, 2000. (Cited in Chung and Wooley, 2003.)
(^27) J.L. Edwards, M.A. Lane, and E.S. Nielsen, “Interoperability of Biodiversity Databases: Biodiversity Information on Every
Desk,” Science 289(5488):2312-2314, 2000; National Biological Information Infrastructure (NBII), available at http://
http://www.nbii.gov/disciplines/systematics.html; Federal Geographic Data Committee (FGDC), available at http://www.fgdc.gov/.
(All cited in Chung and Wooley, 2003.)
(^28) Gene Expression Ontology Working Group, see http://www.mged.org/; P.D. Karp, M. Riley, S.M. Paley, and A. Pellegrini-
Toole, “The MetaCyc Database,” Nucleic Acids Research 30(1):59-61, 2002; P.D. Karp, M. Riley, M. Saier, I.T. Paulsen, J. Collado-
Vides, S.M. Paley, A. Pellegrini-Toole, et al., “The EcoCyc Database,” Nucleic Acids Research 30(1):56-58, 2002; D.E. Oliver, D.L.
Rubin, J.M. Stuart, M. Hewett, T.E. Klein, and R.B. Altman, “Ontology Development for a Pharmacogenetics Knowledge Base,”
Pacific Symposium on Biocomputing 65-76, 2002. (All cited in Chung and Wooley, 2003.)

Free download pdf