2, 3, and so on, are called thetrailing(or less significant) digits.They form the leaves—
the horizontal elements—of our display.^4
On the right side of Figure 2.7 you can see that next to the stem entry of 4 you have one
0, two 1s, a 2, two 3s, a 4, three 6s, a 7, an 8, and two 9s. These leaf values correspond to
the units’ digits in the raw data. Similarly, note how the leaves opposite the stem value of 5
correspond to the units’ digits of all responses in the 50s. From the stem-and-leaf display
you could completely regenerate the raw data that went into that display. For example, you
can tell that 11 students spent zero minutes playing electronic games, one student spent two
minutes, two students spent three minutes, and so on. Moreover, the shape of the display
looks just like a sideways histogram, giving you all of the benefits of that method of graph-
ing data as well.
One apparent drawback of this simple stem-and-leaf display is that for some data sets
it will lead to a grouping that is too coarse for our purposes. In fact, that is why I needed to
use hypothetical data for this introductory example. When I tried to use the reaction time
data, I found that the stem for 50 (i.e., 5) had 88 leaves opposite it, which was a little silly.
Not to worry; Tukey was there before us and figured out a clever way around this problem.
If the problem is that we are trying to lump together everything between 50 and 59, per-
haps what we should be doing is breaking that interval into smaller intervals. We could try
using the intervals 50–54, 55–59, and so on. But then we couldn’t just use 5 as the stem,
because it would not distinguish between the two intervals. Tukey suggested using “5*” to
represent 50–54, and “5.” to represent 55–59. But that won’t solve our problem here, be-
cause the categories still are too coarse. So Tukey suggested an alternative scheme where
“5*” represents 50–51, “5t” represents 52–53, “5f” represents 54–55, “5s” represents
56–57, and “5.” represents 58–59. (Can you guess why he used those particular letters?
Hint: “Two” and “three” both start with “t.”) If we apply this scheme to the data on reac-
tion times, we obtain the results shown in Figure 2.8. In deciding on the number of stems
to use, the problem is similar to selecting the number of categories in a histogram. Again,
you want to do something that makes sense and that conveys information in a meaningful
way. The one restriction is that the stems should be the same width. You would not let one
stem be 50–54, and another 60–69.
Section 2.4 Stem-and-Leaf Displays 25
Raw Data Stem Leaf
. 0 00000000000233566678
. 1 2223555579
. 2 33577
40 41 41 42 43 3 22278999
43 44 46 46 46 4 01123346667899
47 48 49 49 5 24557899
52 54 55 55 57 6 37
58 59 59 7 1556689
63 67 8 34779
71 75 75 76 76 9 466
78 79 10 23677
. 11 3479
. 12 2557899
. 13 89
Figure 2.7 Stem-and-leaf display of electronic game data
(^4) It is not always true that the tens’ digits form the stem and the units’ digits the leaves. For example, if the data
ranged from 100 to 1000, the hundreds’ digits would form the stem, the tens’ digits the leaves, and we would
ignore the units’ digits.
trailing digits
less significant
digits
leaves