Catalyzing Inquiry at the Interface of Computing and Biology

(nextflipdebug5) #1
COMPUTATIONAL TOOLS 65

4.2.6 Data Mediators/Middleware,


In the middleware approach, an intermediate processing layer (a “mediator”) decouples the under-
lying heterogeneous, distributed data sources and the client layer of end users and applications.^11 The
mediator layer (i.e., the middleware) performs the core functions of data transformation and integra-
tion, and communicates with the database “wrappers” and the user application layer. (A “wrapper” is
a software component associated with an underlying data source that is generally used to handle the
tasks of access to specified data sources, extraction and retrieval of selected data, and translation of
source data formats into a common data model designed for the integration system.)
The common model for data derived from the underlying data sources is the responsibility of the
mediator. This model must be sufficiently rich to accommodate various data formats of existing biologi-
cal data sources, which may include unstructured text files, semistructured XML and HTML files, and
structured relational, object-oriented, and nested complex data models. In addition, the internal data
model must facilitate the structuring of integrated biological objects to present to the user application
layer. Finally, the mediator also provides services such as filtering, managing metadata, and resolving
semantic inconsistency in source databases.
There are many flavors of mediator approaches in life science domains. IBM’s DiscoveryLink for the
life sciences is one of the best known.^12 The Kleisli system provides an internal nested complex data
model and a high-power query and transformation language for data integration.^13 K2 shares many
design principles with Kleisli in supporting a complex data model, but adopts more object-oriented
features.^14 OPM supports a rich object model and a global schema for data integration.^15 TAMBIS
provides a global ontology (see Section 4.2.8 on ontologies) to facilitate queries across multiple data
sources.^16 TSIMMIS is a mediation system for information integration with its own data model (Object-
Exchange Model, OEM) and query language.^17


4.2.7 Databases as Models,


A natural progression for databases established to meet the needs and interests of specialized
communities, such as research on cell signaling pathways or programmed cell death, is the evolution of


(^11) G. Wiederhold, “Mediators in the Architecture of Future Information Systems,” IEEE Computer 25(3):38-49, 1992; G.
Wiederhold and M. Genesereth, “The Conceptual Basis for Mediation Services,” IEEE Expert, Intelligent Systems and Their Applica-
tions 12(5):38-47, 1997. (Both cited in Chung and Wooley, 2003.)
(^12) L.M. Haas et al., “DiscoveryLink: A System for Integrated access to Life Sciences Data Sources,” IBM Systems Journal 40(2):489-
511, 2001.
(^13) S. Davidson, C. Overton, V. Tannen, and L. Wong, “BioKleisli: A Digital Library for Biomedical Researchers,” International
Journal of Digital Libraries 1(1):36-53, 1997; L. Wong, “Kleisli, a Functional Query System,” Journal of Functional Programming
10(1):19-56, 2000. (Both cited in Chung and Wooley, 2003.)
(^14) J. Crabtree, S. Harker, and V. Tannen, “The Information Integration System K2,” available at http://db.cis.upenn.edu/K2/
K2.doc; S.B. Davidson, J. Crabtree, B.P. Brunk, J. Schug, V. Tannen, G.C. Overton, and C.J. Stoeckert, Jr., “K2/Kleisli and GUS:
Experiments in Integrated Access to Genomic Data Sources,” IBM Systems Journal 40(2):489-511, 2001. (Both cited in Chung and
Wooley, 2003.)
(^15) I-M.A. Chen and V.M. Markowitz, “An Overview of the Object-Protocol Model (OPM) and OPM Data Management Tools,”
Information Systems 20(5):393-418, 1995; I-M.A. Chen, A.S. Kosky, V.M. Markowitz, and E. Szeto, “Constructing and Maintaining
Scientific Database Views in the Framework of the Object-Protocol Model,” Proceedings of the Ninth International Conference on
Scientific and Statistical Database Management, Institute of Electrical and Electronic Engineers, Inc., New York, 1997, pp. 237–248.
(Cited in Chung and Wooley, 2003.)
(^16) N.W. Paton, R. Stevens, P. Baker, C.A. Goble, S. Bechhofer, and A. Brass, “Query Processing in the TAMBIS Bioinformatics
Source Integration System,” Proceedings of the 11th International Conference on Scientific and Statistical Database Management, IEEE,
New York 1999, pp. 138-147; R. Stevens, P. Baker, S. Bechhofer, G. Ng, A. Jacoby, N.W. Paton, C.A. Goble, and A. Brass,
“TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources,” Bioinformatics 16(2):184-186, 2000. (Both cited in
Chung and Wooley, 2003.)
(^17) Y. Papakonstantinou, H. Garcia-Molina, and J. Widom, “Object Exchange Across Heterogeneous Information Sources,”
Proceedings of the IEEE Conference on Data Engineering, IEEE, New York, 1995, pp. 251-260. (Cited in Chung and Wooley, 2003.)

Free download pdf