untitled

(ff) #1
10.1 Text Transformations 205

to perform the transformation,file.txtis the name of the file to be trans-
formed (also called the “input” file), andresult.txtis the transformed
file that is produced by your program. One can specify more than one file to
be transformed, but there is only one file produced as a result of the transfor-
mation. It is as if there were a single input file made up of the data in all of
the files put together in order.
The programs in this part of the chapter consider input file formats and
transformation tasks that get progressively more complex. The early ones
use simple flat files with fixed-width format, and the later ones use more
complex formats. Early tasks make no changes to the data in the input files;
the task is just to change the format. Later tasks perform statistical computa-
tions.

10.1.1 Line-Oriented Transformation


The simplest approach to transformation is just to read the file one line at a
time, transforming each line as it is read. The program for this looks a lot
like a book or paper: it has an introduction, a main body, and a conclusion.
The introduction takes care of tasks that precede the transformation such as
printing a report title, and the conclusion performs tasks such as printing
summary information. Sometimes the introduction or the conclusion will be
omitted, but there will always be a body, as that is where the transformation
takes place.
Consider the task in which the health study mentioned in section 1.1 is to
be transformed from the fixed-width format to a variable-width format that
is more readable for people. The input file has lines that start like this:

011500 18.66 0 0 62 46.27102
011500 26.93 0 1 63 68.95152
020100 33.95 1 0 65 92.53204
020100 17.38 0 0 67 50.35111

The output should look like this:

Health Study Data

1/15/2000 18.66 normal 62 cm 46.27 kg 102 lb
1/15/2000 26.93 overweight 63 cm 68.95 kg 152 lb
2/1/2000 33.95 obese 65 cm 92.53 kg 204 lb
2/1/2000 17.38 normal 67 cm 50.35 kg 111 lb
Free download pdf