Catalyzing Inquiry at the Interface of Computing and Biology

COMPUTATIONAL TOOLS 69

An ontology developed by one subcommunity inevitably leads to interactions with related ontolo-
gies and the need to integrate. For example, consider the concept of homology. In traditional evolution-
ary biology, “analogy” is used to describe things that are identical by function and “homology” is used
to identify things that are identical by descent. However, in considering DNA, function and descent are
both captured in the DNA sequence, and therefore to molecular biologists, homology has come to mean
simply similarity in sequence, regardless of whether this is due to convergence or ancestry. Thus, the
term “homologous” means different things in molecular biology and evolutionary biology.^29 More
broadly, a brain ontology will inevitably relate to ontologies of other anatomic structures or at the
molecular level sharing ontologies for genes and proteins.^30
Difficulties of integrating diverse but related databases thus are transformed into analogous diffi-
culties in integrating diverse but related ontologies, but since each ontology represents the integration
of multiple databases relevant to the field, the integration effort at the higher level is more encompass-
ing. At the same time, it is also more difficult, because the implications of changes in fundamental
concepts—which will be necessary in any integration effort—are much more far-reaching than analo-
gous changes in a database. That is, design compromises in the development of individual ontologies
might make it impossible to integrate the ontologies without changes to some of their basic components.
This would require undoing the ontologies, then redoing them to support integration.
These points relate to semantic interoperability, which is an active area of research in computer
science.^31 Information integration across multiple biological disciplines and subdisciplines would de-
pend on the close collaborations of domain experts and information technology professionals to de-
velop algorithms and flexible approaches to bridge the gaps between multiple biological ontologies. In
recent years, a number of life science researchers have come to believe in the potential of the Semantic
Web for integrating biological ontologies, as described in Box 4.3.
A sample collection of ontology resources for controlled vocabulary purposes in the life sciences is
listed in Table 4.1.

4.2.8.2 Ontologies for Automated Reasoning

Today, it is standard practice to store biological data in databases; no one would deny that the
volume of available data is far beyond the capabilities of human memory or written text. However, even
as the volume of analytic and theoretical results drawn from these data (such as inferred genetic
regulatory, metabolic, and signaling network relationships) grows, it will become necessary to store
such information as well in a format suitable for computational access.
The essential rationale underlying automated reasoning is that reasoning one’s way through all of
the complexity inherent in biological organisms is very difficult, and indeed may be, for all practical
purposes, impossible for the knowledge bases that are required to characterize even the simplest organ-
isms. Consider, for example, the networks related to genetic regulation, metabolism, and signaling of an
organism such as Escherichia coli. These networks are too large for humans to reason about in their
totality, which means that it is increasingly difficult for scientists to be certain about global network
properties. Is the model complete? Is it consistent? Does it explain all of the data? For example, the
database of known molecular pathways in E. coli contains many hundreds of connections, far more than
most researchers could remember, much less reason about.

(^29) For more on the homology issue, see W.M. Fitch, “Homology: A Personal View on Some of the Problems,” Trends in Genetics
16(5):227-231, 2000.
(^30) A. Gupta, B. Ludäscher, and M.E. Martone, “Knowledge-Based Integration of Neuroscience Data Sources” Conference on
Scientific and Statistical Database Management, Berlin, IEEE Computer Society, July 2000. (Cited in Chung and Wooley, 2003.)
(^31) P. Mitra, G. Wiederhold, and M. Kersten, “A Graph-oriented Model for Articulation of Ontology Interdependencies,” Pro-
ceedings of Conference on Extending Database Technology Konstanz, Germany, March 2000. (Cited in Chung and Wooley, 2003.)

Catalyzing Inquiry at the Interface of Computing and Biology

COMPUTATIONAL TOOLS 69

Get our desktop app

Company

Features

Documentation

Resources