Basic Statistics

(Barry) #1

38 FREQUENCY TABLES AND THEIR GRAPHS


make, especially if one starts with an ordered array such as that given in Table 4.2.
Statistical packages such as Minitab, SAS, SPSS, and Stata provide stem and leaf
tables.
Alternatively, data may be presented in the form of a frequency table. In a frequency
table, details are sacrificed in the hope of showing broad essentials more clearly.
A frequency table can also be viewed as the first step in making a histogram (see
Section 4.2.1).


4.1.3 The Frequency Table

To make a frequency table, we find the interval that includes the smallest and largest
observation in the data set (here 12.2-26.2) and decide on some convenient way of
dividing it into intervals called class intervals or classes. The number of observa-
tions that fall in each class interval are then counted; these numbers form a column
headed frequency.
Table 4.4 shows a frequency table of the hemoglobin data for the 90 workers. In
Table 4.4, the first class interval was chosen to be 12.0-12.9. Here, 12.0 was chosen
for convenience as the starting point, and the length of each interval is 1 g/cm3. The
table succeeds in giving the essentials of the entire set of data in a form that is compact
and can be read quickly.
The investigator who collected the data may find the frequency table as it stands
quite adequate for their use, or they may for some purposes prefer using the original
90 observations. An investigator wishing to publish printed data for others to use
may publish a frequency table rather than the raw data unless the data set is small. It
is important in either case that the table be properly labeled, with title, source, units,
and so on. It is also important that the class intervals be designated in such a way that
it is clear exactly which numbers are included in each class.
In Table 4.4, the designation of the class intervals is done in the most usual way. We
might wonder, however, in looking at it, what happened to workers whose hemoglobin
measurements were between 12.9 and 13.0g/cm3. The answer is that the measure-
ments were made originally to the nearest .1 g/cc, so there are no measurements
listed between 12.9 and 13.0g/cm3. The class intervals were made to reflect the way
the measurements were made; if the measurements had been made to the nearest
0.01 g/cm3, the appropriate first interval would be 12.00-12.99.
Table 4.4 also displays the midpoints of each interval. The midpoint for the first
interval is the average of 11.95 and 12.95 or (11.95 + 12.95)/2 = 11.45. Note that
all hemoglobin levels between 11.95 and 12.95 fall in the first interval because the
measurements have been made to the nearest 0.1 g/cm3. Subsequent midpoints are
found by adding one to the previous midpoint since the class interval is 1 g/cm3 long.
The midpoints are used to represent a typical value in the class interval.
There is no one “correct” frequency table such that all the rest are incorrect, but
some are better than others in showing the important features of the set of data without
keeping too much detail. Often, a researcher chooses a class interval based on one
used in the past; for researchers wishing to compare their data with that of others, this

Free download pdf