10.1 Text Transformations 233
Given the label for a motif, one can obtain the motif by using the label
as the key:$motifs{$label}. However, this is a scalar, not the hash of
DNA positions. This is the trickiest part of the program. To get the hash of
DNA positions, one must use the expression%{$motifs{$label}}.This
may seem mysterious at first, but it all makes sense when one finds out that
everyuse of the prefixes$,%,and@are actually supposed to look like this.
Omitting the braces is an abbreviation that one can use for simple variable
names.
Once the hash for one motif has been obtained, one just loops over the
positions and then over the four bases. The program explicitly writes out the
DNA bases, because it is printing them in an order that is not alphabetical.
After printing the probability distribution, a newline is printed to end the
line. The output of the program will look something like this:
Probability distributions for motif 1
A 0.037037037037037 C 0.111111111111111 T 0.851851851851852 G 0
A 0.037037037037037 C 0.037037037037037 T 0 G 0.925925925925926
A 0 C 0.62962962962963 T 0.185185185185185 G 0.185185185185185
...
Perl will always print everything that it knows about a number. In many
cases the numbers will have far too many decimal places than are merited
by the data. To specify the exact number of decimal places that should be
printed one should use theprintfstatement. It would look like this:
printf(’%s %5.3f ’, $base, $motif{$position}{$base});
The first parameter of theprintfstatement is called theformat.Itspurpose
is to specify what kinds of data are to be printed as well as the precise format
to use for each one. Each format specification begins with a percent sign.
This use of the percent sign has no connection with the notion of a Perl hash.
The%sformat means that the variable is to be printed verbatim. Thesstands
for “string.” The%5.3fformat means that the variable is to be printed as a
number with three digits after the decimal point and five characters in all (in-
cluding the decimal point). Thefstands for “floating-point number.” Using
this format, the output of the program would look like this:
Probability distributions for motif 1
A 0.037 C 0.111 T 0.852 G 0.000
A 0.037 C 0.037 T 0.000 G 0.926
A 0.000 C 0.630 T 0.185 G 0.185