Introductory Biostatistics

(Chris Devlin) #1

If a data set is to be grouped to form a frequency distribution, di‰culties
should be recognized, and an e‰cient strategy is needed for better communi-
cation. First, there is no clear-cut rule on the number of intervals or classes.
With too many intervals, the data are not summarized enough for a clear
visualization of how they are distributed. On the other hand, too few intervals
are undesirable because the data are oversummarized, and some of the details
of the distribution may be lost. In general, between 5 and 15 intervals are
acceptable; of course, this also depends on the number of observations, we can
and should use more intervals for larger data sets.
The widths of the intervals must also be decided. Example 2.1 shows the
special case of mortality data, where it is traditional to show infant deaths
(deaths of persons who are born live but die before living one year). Without
such specific reasons, intervals generally should be of the same width. This
common widthwmay be determined by dividing the rangeRbyk, the number
of intervals:



R


k

where the rangeRis the di¤erence between the smallest and largest in the data
set. In addition, a width should be chosen so that it is convenient to use or easy
to recognize, such as a multiple of 5 (or 1, for example, if the data set has a
narrow range). Similar considerations apply to the choice of the beginning of
the first interval; it is a convenient number that is low enough for the first
interval to include the smallest observation. Finally, care should be taken in
deciding in which interval to place an observation falling on one of the interval
boundaries. For example, a consistent rule could be made so as to place such
an observation in the interval of which the observation in question is the lower
limit.


Example 2.2 The following are weights in pounds of 57 children at a day-care
center:


68 63 42 27 30 36 28 32 79 27
22 23 24 25 44 65 43 25 74 51
36 42 28 31 28 25 45 12 57 51
12 32 49 38 42 27 31 50 38 21
16 24 69 47 23 22 43 27 49 28
23 19 46 30 43 49 12

From the data set above we have:


  1. The smallest number is 12 and the largest is 79, so that


R¼ 79  12
¼ 67

TABULAR AND GRAPHICAL METHODS 59
Free download pdf