236 10 Transforming with Traditional Programming Languages
Once bioinformatics data have been represented in an XML format, it can
be transformed using a wide variety of tools. In keeping with the TMTOWTDI
philosophy of Perl, there are a great number of ways to transform XML us-
ing Perl. Here are some of the Perl modules that can be used to process XML
documents:- XML::Parserprovides one with the ability to process XML one element
at a time. It is analogous to reading a file one line at a time. However,
because elements can contain other elements, it is important to know not
only when one starts reading an element but also when an element is fin-
ished. This process is similar to the pattern-matching programs in subsec-
tion 10.1.4 such as program 10.12. The XML parser looks for the patterns
that indicate when an element begins and when an element ends. - XML::DOMis analogous to program 10.5 in subsection 10.1.2. Instead of
processing the document one line at a time, the entire document is read
into a single data structure, and one is free to examine the parts in what-
ever order is convenient. Of course, XML has a hierarchical document
structure, so the Perl data structure will also be hierarchical. - XML::XPathorganizes the document like a directory of files, exactly as in
section 8.1.
Summary- A Perl module groups together scalars, arrays, hashes and procedures as
asingleunit.
•Thecpancommand, or its equivalent, can be used to install Perl modules
that have been published on the CPAN website.•The->operator refers to one of the items in a module.- Perl modules are available for processing and querying XML documents.
10.2.2 Processing XML Elements
The simplest way to process XML is to read the document one element at
a time. This is analogous to reading a file one line at a time, as in pro-
gram 10.1 of subsection 10.1.1. Processing an XML document is calledpars-
ing, which is the term that computer scientists use for processing any com-
puter language. There is a Perl module that will parse XML documents