236 10 Transforming with Traditional Programming Languages
Once bioinformatics data have been represented in an XML format, it can
be transformed using a wide variety of tools. In keeping with the TMTOWTDI
philosophy of Perl, there are a great number of ways to transform XML us-
ing Perl. Here are some of the Perl modules that can be used to process XML
documents:
- XML::Parserprovides one with the ability to process XML one element
at a time. It is analogous to reading a file one line at a time. However,
because elements can contain other elements, it is important to know not
only when one starts reading an element but also when an element is fin-
ished. This process is similar to the pattern-matching programs in subsec-
tion 10.1.4 such as program 10.12. The XML parser looks for the patterns
that indicate when an element begins and when an element ends. - XML::DOMis analogous to program 10.5 in subsection 10.1.2. Instead of
processing the document one line at a time, the entire document is read
into a single data structure, and one is free to examine the parts in what-
ever order is convenient. Of course, XML has a hierarchical document
structure, so the Perl data structure will also be hierarchical. - XML::XPathorganizes the document like a directory of files, exactly as in
section 8.1.
Summary
- A Perl module groups together scalars, arrays, hashes and procedures as
asingleunit.
•Thecpancommand, or its equivalent, can be used to install Perl modules
that have been published on the CPAN website.
•The->operator refers to one of the items in a module.
- Perl modules are available for processing and querying XML documents.
10.2.2 Processing XML Elements
The simplest way to process XML is to read the document one element at
a time. This is analogous to reading a file one line at a time, as in pro-
gram 10.1 of subsection 10.1.1. Processing an XML document is calledpars-
ing, which is the term that computer scientists use for processing any com-
puter language. There is a Perl module that will parse XML documents