Figure 3.13: Box and whisker plot for the variable
ASCORE1, total A-level score (stem and leaf plot
adjacent)
Grouped Frequency Table
In large data sets with continuous variables a grouped frequency table may be used to
obtain a general picture of data distributions. For example, look at the values of the
variable AGEY (Age in years) in Figure 3.12. It is difficult to discern any pattern in the
distribution of ages. Rearrangement of the data in a grouped frequency table may provide
a clearer picture of the distribution of ages.
With so many data values a frequency distribution constructed by counting the number
of cases observed at each age value would be no more informative than looking at an
ordered list of individual ages. It is often more convenient in these circumstances to
group the data values and to record the frequency within each group, called a class
interval. The only difference between a frequency distribution which has grouped data
and one that does not is that rather than having frequencies for each possible data value
the data values are grouped into class intervals and frequencies are stated for each class
interval.
In addition to a simple frequency count for each class interval, the relative number or
percentage of observations that fall into each class interval are reported. These are called
relative frequencies and are expressed as percentages. The percentage for a given class
is obtained by dividing the class frequency by the total frequency of data values for all
classes. The sum of the relative frequencies should be 100 per cent; this provides a quick
check for any errors. The advantage of a relative frequency distribution is that it
expresses the pattern of scores in a way that does not depend on the specific number of
cases observed at each score value or interval of score values.
Example 3.8
To obtain a grouped relative frequency table for the variable AGEY in the A-level data
set use the following SAS code:
proc format;
value clasfmt
16.5–18.5='17–18'
18.5–20.5='19–20'
20.5–22.5='21–22'
22.5–24.5='23–24'
24.5–26.5='25–26'
26.5–28.5='27–28';
run;
proc freq;
Statistical analysis for education and psychology researchers 58