untitled

(ff) #1

198 9 The Transformation Process



  1. Attributes need to be changed into elements, or vice versa.

  2. One must infer new information.

  3. Several documents should be merged into a single document or a single
    document should be split into several.

  4. Information must be selected from one or more documents. This is essen-
    tially the same as querying.

  5. An entirely different kind of document is required, such as an HTML doc-
    ument suitable for a web browser, a comma-separated values (CSV) file
    suitable for a spreadsheet, even a LaTeX file suitable for typesetting.

  6. Element information has to be combined. Processing can range from rel-
    atively simple operations such as computing totals and averages to using
    sophisticated algorithms.
    Transformation is performed by means of a program. There are many
    programming languages that can be used for transformation, and there are
    many variations on how the transformation process can be carried out. The
    one that is best will depend not only on the nature of the transformation but
    also on one’s background and experience.
    Traditional programming languages such as Perl and Java can be used for
    XML transformation. If one is already familiar with one of these languages,
    then it might be best to stay with it. Even so there are two distinctly different
    approaches to transformation using traditional programming languages. A
    third possibility that is becoming increasingly popular is to use a rule-based
    language specifically designed for XML transformation. We now discuss
    each of these three approaches.
    The first approach is calledevent-based parsingor, more succinctly,parsing.
    The document is read as input, and it identifies the interesting events, such
    as the beginning of an element, the end of an element, the content of an ele-
    ment, and so on. The events occur in exactly the same order as they appear
    in the document. When each event occurs, a corresponding procedure is
    called, and the features of the event are available as parameters. The pro-
    cedures that are called form the application programming interface (API).
    Event-based parsing for XML most commonly uses the simple API for XML
    (SAX). For example, when the beginning of an element is encountered, the
    startElementprocedure is called. The parameters include the name of
    the element and its attributes. This approach is covered in detail for Perl in
    subsections 10.2.2 and 10.2.5.

Free download pdf