10.1 Text Transformations 207
but rather indicate that the variables arescalars, that is, numbers or strings
(ordinary text). The line that was just read is available in the variable whose
name is an underscore character. One extracts parts of a string by using the
substrfunction (short for “substring”). Scalars have a kind of “split per-
sonality” since they can be either numbers or strings. Thesubstrfunction
produces a string, but all of the substrings being extracted in this program
are supposed to be numbers. One can change a scalar to a number by adding
0 to it. If the scalar is already a number this does nothing. If the scalar is a
string, then this will find some way to interpret the string as being a number.
Perl is very flexible in how it interprets strings as numbers. For example, if
there is some text in the string that could not be part of a number, it (and
everything after it) is just ignored. For example “123 kgs” will be interpreted
as the number 123, and “Hello 123” will be interpreted as the number 0.
The computation of the year is somewhat problematic because there are
only two digits in the original file, but the full year number is expected in
the report. This is handled by adding conditions after the statements that
compute the full four-digit year. The assumption is that all years are between
1921 and 2020.
Program 10.1 is certainly not the only way to perform this task in Perl.
The style of programming was chosen to make it as easy as possible to read
this program. The use of angle brackets for obtaining the next line of the file
which is then represented using an underscore is a bit obscure, but it is a
commonly used motif in Perl. It is relatively easy to get used to it.
To illustrate some of the variations that are possible in Perl, program 10.1
could also have been written as in program 10.2. This program avoids the
use parentheses as much as possible, and when it does use them, it does
so differently than the first program. In general, one can omit parentheses in
functions, and it is only necessary to include them when Perl would misinter-
pret your intentions. If the parentheses were omitted in the tests for obesity
and overweight, then Perl would have compared 0 with 3 rather than with
the number extracted from the original file. Notice also that the semicolon
after the last statement can be omitted because it occurs immediately before
a right brace.
The next task to consider is the computation of summary information. One
common use of data from a study is to compute the mean and variance. Pro-
gram 10.3 computes the mean, variance, and standard deviation of the BMI
column of the health study. Running this program on the four records at the
beginning of this section produces this report: