204 10 Transforming with Traditional Programming Languages
ties for what a program can look like enormously. It can also make it difficult
for a person to read some Perl programs even if the Perl compiler has no dif-
ficulty with it. Except for some common Perl motifs and an example in the
section 10.1 below, the examples in this chapter will try to use a programming
style that emphasizes readability over cleverness as much as possible.
Some of the most common programming tasks can be classified as being
transformations. Even statistical computations are a form of data transfor-
mation. To organize the transformation tasks, the world of data will be di-
vided into XML and text files. The text file category includes flat files as well
as the text produced by many bioinformatics tools. This lumps together a lot
of very different formats, but it is convenient for classification purposes. The
many file formats (such as PDF, Word, spreadsheet formats, etc.) that require
specialized software for their interpretation will not be considered unless the
format can be converted to either an XML file or a text file.
The first section of the chapter deals with non-XML text processing, and
the second section of the chapter deals with XML processing. Many tech-
niques from the first part reappear in the second, but some new notions are
also required.
10.1 Text Transformations
The subsections of this part of the chapter deal with increasingly complex
data and transformations of the data. The first two subsections consider data
having a structure that is uniform, as in flat files and database tables. The
first subsection shows how to process such information one line or record at
a time; the second introduces arrays which allow one to process the infor-
mation in some other fashion than as it is being received. The third subsec-
tion acts as an interlude between the first two and the last two subsections.
It covers procedures which are important for organizing programs as they
get larger. The last two subsections consider data with more complicated
structures. The fourth subsection shows how to extract information from
complicated text, which is processed as it is extracted. The fifth introduces
data structures which allow one to process complicated data in some other
fashion than as it is being extracted.
Perl can be invoked in many ways, but one of the most common is to use
a command such as this:
perl program.perl file.txt > result.txt
whereprogram.perlis the name of the file containing the Perl program