tables agey;
format agey clasfmt.;
run;
The format procedure when used in this way automatically changes overlapping range
values to be noninclusive, the first occurrence is included and the second occurrence is
excluded. Your output should look similar to Table 3.4. A glance at this table reveals
there are 114 observations (cumulative frequency total) which tallies with the expected
number of cases, the minimum value would be located in the class interval 17–18 and the
maximum value found in the interval 27–28. You can also see that the largest percentage
of students, 68.4 per cent, is in the age range 19 to 20 years. The one observation in the
interval 27–28 would appear to be an outlier.
Percentages ought not to be used with small numbers because a small change in the
number of cases brings about an apparently large change in percentage points.
Percentages or relative frequencies are particularly useful when looking at two or more
distributions with different numbers of data points in each distribution. It is then as if the
distributions each had 100 scores.
Tables are so often used in journal articles and reports to present data or summary
statistics, a few comments are included here on clear presentation. First, provide a clear
explanatory title including units of measurement if appropriate. Arrange the table so that
columns are longer than the width of rows, it is easier to look down a column than to scan
across a row. Round the numbers to an appropriate number of decimal places, seldom
more than two, and arrange the data in an appropriate natural order or in order of size.
Avoid footnotes if possible. Finally, you should summarize, in a brief paragraph, the
main patterns and features of the data illustrated in the table.
Table 3.4: Grouped relative frequency table for the variable age in years
AGE IN YEARS
AGEY Frequency Percent Cumulative Frequency Cumultative Percent
17–18 26 22.8 26 22.8
19–20 78 68.4 104 91.2
21–22 8 7.0 12 98.2
23–24 1 0.9 113 99.1
27–28 1 0.9 114 100.0
In constructing Table 3.4 two related decisions have to be made. Namely, the number of
class intervals and the width of each class interval. Usually the number of class intervals
is between 5 and 20 depending upon the number of cases. Generally, the smaller the
number of cases, then the fewer class intervals should be used. Too many class intervals
will not summarize the data and too few may not describe the data accurately. You
should choose natural intervals whenever possible. To estimate an approximate number
of class intervals, divide the range of the distribution by a selected interval width so that
you arrive at a number of intervals somewhere in the range 5 to 20.
The inclusive range of a distribution is the maximum data value minus the minimum
data value +1. The range for the 114 ages recorded in Figure 3.12 is 11.4 (27.2−16.8) +1.
Initial data analysis 59