untitled

(ff) #1

230 10 Transforming with Traditional Programming Languages



  • When a pattern matches, Perl extracts the text that matches the whole
    pattern as well as text that matches each subpattern.


10.1.5 Perl Data Structures


While pattern matching is a powerful feature for finding information in an
input file, it is not enough by itself when the information is arranged in a
different order than is needed by the transformation task. Consider the fol-
lowing excerpt from the output produced by the CONSENSUS (Stormo and
Hartzell III 1989; Hertz et al. 1990; Hertz and Stormo 1999) motif-finding
program:
MATRIX 2
number of sequences = 19
unadjusted information = 13.2069
sample size adjusted information = 12.0373
ln(p-value) = -198.594 p-value = 5.64573E-87
ln(expected frequency) = -57.9937 expected frequency = 6.51143E-26
A|00018160700004019
T|18000183151932000
C|1021006401612700
G|019170211300058190

This excerpt shows the probability distributions for one motif (labeled
“MATRIX 2”). There are two ways in which this file differs from what is
necessary for the task. First, the distributions are given in terms of frequen-
cies rather than probabilities. Second, the frequencies are listed by DNA base
rather than by position in the motif. The first difference is easy to fix: one can
just divide by the total number of sequences. The second difference is not so
easily handled because the information has the wrong arrangement.
To rearrange information obtained from an input file, it is necessary to
store information from several lines before printing it. This would be easy
if the information consisted of a few scalars, but it gets much more compli-
cated when substantial amounts of data must be organized. The technique
for doing this in programming languages is called adata structure. Some data
structures have already been used; namely, arrays and hashes. These are the
simplest data structures. One constructs more complex data structures by us-
ing a technique callednesting.Anested data structureis a data structure whose
items are themselves data structures. For example, one can have an array of
hashes, or a hash of arrays, or a hash of hashes of arrays, and so on. There is
no limit to how deeply nested a data structure can be. The special case of an
array of arrays was already developed in subsection 10.1.2. Data structures
extend the concept of multidimensional array to allow for dimensions that
Free download pdf