10.1 Text Transformations 213
@record = split;
This is actually better than the previous form because it will treat all forms
of “white space” (such as tab characters) as being the same as spaces. Fi-
nally, one can abbreviate all the way tosplit;except that now the array
containing the fields of the line is@_instead of@record.
The opposite ofsplitisjoin. One can put the split array back together
after splitting by using
join(" ", @record);
This can be handy if one would like to separate the fields with a character
other than a space. For example,
join(",", @record);
would use a comma to separate the fields.
Statistics for an entire population as were just computed are generally of
limited interest. It is far more interesting to look for correlations between
various characteristics of the population. When characteristics, such as ages,
have a temporal significance, it is also interesting to look for trends. Consider
the task of computing the BMI as a function of the month and year. In statis-
tical terminology one is interested in the conditional probability distribution
of the BMI given the month and year. The month and year are specified by
the first and third parts of the first field of the health study record (the second
being the day of the month). The task is to compute the mean and variance
of the BMI grouped by the month and year.
Using scalar variables or arrays is not enough to accomplish this task. We
need a way to group information by month. The information belonging to a
month is said to beassociatedwith the month and year. Each month is called
akeyfor the associated information, and each key ismappedto its associated
information. A mapping from keys to associated information is called an
associative array,hash table,orjusthash, for short. Perl uses the character%
to distinguish hashes from scalars and arrays. In the following program,
statistics are computed for each month separately using three hashes:
print "Health Study Data\n\n";
%count = ();
%bmisum = ();
%bmisumsq = ();