for purposes of estimating variability. Thus, you lose that degree of freedom that we dis-
cussed, and you have only N 21 dfleft (N 2 1 scores free to vary). We lose this one de-
gree of freedom whenever we estimate a mean. It follows that the denominator (the number
of scores on which our estimate is based) should reflect this restriction. It represents the
number of independent pieces of data.
2.9 Boxplots: Graphical Representations of Dispersions and Extreme Scores
Earlier you saw how stem-and-leaf displays represent data in several meaningful ways at
the same time. Such displays combine data into something very much like a histogram,
while retaining the individual values of the observations. In addition to the stem-and-leaf
display, John Tukey has developed other ways of looking at data, one of which gives
greater prominence to the dispersion of the data. This method is known as a boxplot,or,
sometimes, box-and-whisker plot.
The data and the accompanying stem-and-leaf display in Table 2.7 were taken from
normal- and low-birthweight infants participating in a study of infant development at the
University of Vermont and represent preliminary data on the length of hospitalization of
38 normal-birthweight infants. Data on three infants are missing for this particular vari-
able and are represented by an asterisk (*). (Asterisks are included to emphasize that we
should not just ignore missing data.) Because the data vary from 1 to 10, with two ex-
ceptions, all the leaves are zero. The zeros really just fill in space to produce a histogram-
like distribution. Examination of the data as plotted in the stem-and-leaf display reveals
that the distribution is positively skewed with a median stay of 3 days. Near the bottom
of the stem you will see the entry HI and the values 20 and 33. These are extreme values,
or outliers, and are set off in this way to highlight their existence. Whether they are large
enough to make us suspicious is one of the questions a boxplot is designed to address.
The last line of the stem-and-leaf display indicates the number of missing observations.
Tukey originally defined boxplots in terms of special measures that he devised. Most
people now draw boxplots using more traditional measures, and I am adopting that ap-
proach in this edition.
48 Chapter 2 Describing and Exploring Data
Table 2.7 Data and stem-and-leaf display on length
of hospitalization for full-term newborn infants (in days)
Data Stem-and-Leaf
217 1 000
1 33 2 2 000000000
2 3 4 3 00000000000
3 * 4 4 0000000
3 3 10 5 00
925 6 0
433 7 0
2062 8
452 9 0
1** 10 0
3 3 4 HI 20, 33
2 3 4 Missing 53
323
24
boxplot
box-and-whisker
plot