9.5 Automating Transformations 201
There are many names for the process of discovering transformations. Rec-
onciling differing terminology in various ontologies is calledontology medi-
ation. For relational databases, the problem is calledschema integrationfor
which there is a large literature. See, for example, (Rahm and Bernstein
2001) for a survey of schema integration tools. Similar structures and con-
cepts that appear in multiple schemas are called “integration points” (Berga-
maschi et al. 1999). When the data from a variety of sources are transformed
to a single target database, then the process is calleddata warehousing.Data
warehousing for relational databases is an entire industry, and many data
warehousing companies now also support XML. If a query using one vocab-
ulary is rewritten so as to retrieve data from various sources, each of which
uses its own vocabulary, then it is calledvirtual data integration. Another
name for this process isquery discovery(Embley et al. 2001; Li and Clifton
2000; Miller et al. 2000).
Ontology mediation and transformation depend on identifying semanti-
cally corresponding elements in a set of schemas. (Do and Rahm 2002; Mad-
havan et al. 2001; Rahm and Bernstein 2001) This is a difficult problem to
solve because terminology for the same entities from different sources may
use very different structural and naming conventions. The same name can be
used for elements having totally different meanings, such as different units,
precision, resolution, measurement protocol, and so on. It is usually nec-
essary to annotate an ontology with auxiliary information to assist one in
determining the meaning of elements, but the ontology mediation and trans-
formation is difficult to automate even with this additional information.
For example, in ecology, the species density is the ratio of the number of
species by the area. In one schema one might have a species density ele-
ment, while in another, there might be elements for both the species count
and area. As another example, in the health study example in section 9.1, the
BMI attribute is a ratio of the weight by the square of the height. Another
database might have only the weight and height, and these attributes might
use different units than in the first database. Consequently, a single element
in one schema may correspond to multiple elements in another. In general,
the correspondence between elements is many-to-many: many elements cor-
respond to many elements.
Many tools for automating ontology mediation have been proposed and
some research prototypes exist. There are also some commercial products for
relational schema integration in the data warehousing industry. However,
these tools mainly help discover simple one-to-one matches, and they do
not consider the meaning of the data or how the transformation will be used.