198 9 The Transformation Process
- Attributes need to be changed into elements, or vice versa.
- One must infer new information.
- Several documents should be merged into a single document or a single
document should be split into several. - Information must be selected from one or more documents. This is essen-
tially the same as querying. - An entirely different kind of document is required, such as an HTML doc-
ument suitable for a web browser, a comma-separated values (CSV) file
suitable for a spreadsheet, even a LaTeX file suitable for typesetting. - Element information has to be combined. Processing can range from rel-
atively simple operations such as computing totals and averages to using
sophisticated algorithms.
Transformation is performed by means of a program. There are many
programming languages that can be used for transformation, and there are
many variations on how the transformation process can be carried out. The
one that is best will depend not only on the nature of the transformation but
also on one’s background and experience.
Traditional programming languages such as Perl and Java can be used for
XML transformation. If one is already familiar with one of these languages,
then it might be best to stay with it. Even so there are two distinctly different
approaches to transformation using traditional programming languages. A
third possibility that is becoming increasingly popular is to use a rule-based
language specifically designed for XML transformation. We now discuss
each of these three approaches.
The first approach is calledevent-based parsingor, more succinctly,parsing.
The document is read as input, and it identifies the interesting events, such
as the beginning of an element, the end of an element, the content of an ele-
ment, and so on. The events occur in exactly the same order as they appear
in the document. When each event occurs, a corresponding procedure is
called, and the features of the event are available as parameters. The pro-
cedures that are called form the application programming interface (API).
Event-based parsing for XML most commonly uses the simple API for XML
(SAX). For example, when the beginning of an element is encountered, the
startElementprocedure is called. The parameters include the name of
the element and its attributes. This approach is covered in detail for Perl in
subsections 10.2.2 and 10.2.5.