Notice in Figure 2.2 that the reaction time data are generally centered on 50–70 hun-
dredths of a second, that the distribution rises and falls fairly regularly, and that the distri-
bution trails off to the right. We would expect such times to trail off to the right (referred to
as being positively skewed) because there is some limit on how quickly the person can
respond, but really no limit on how slowly he can respond. Notice also the extreme value
of 125 hundredths. This value is called an outlierbecause it is widely separated from the
rest of the data. Outliers frequently represent errors in recording data, but in this particular
case it was just a trial in which the subject couldn’t make up his mind which button to push.
2.3 Fitting Smooth Lines to Data
Histograms such as the one shown in Figures 2.1 and 2.2 can often be used to display data
in a meaningful fashion, but they have their own problems. A number of people have pointed
out that histograms, as common as they are, often fail as a clear description of data. This is
especially true with smaller sample sizes where minor changes in the location or width of
the interval can make a noticeable difference in the shape of the distribution. Wilkinson
(1994) has written an excellent paper on this and related problems. Maindonald and Braun
(2007) give the example shown in Figure 2.3 plotting the lengths of possums. The first col-
lapses the data into bins with breakpoints at 72.5, 77.5, 82.5,.... The second uses break-
points at 70, 75, 80,.... Notice that you might draw quite different conclusions from these
two graphs depending on the breakpoints you use. The data are fairly symmetric in the his-
togram on the right, but have a noticeable tail to the left in the histogram on the left.
Figure 2.2 itself was actually a pretty fair representation of reaction times, but we often
can do better by fitting a smoothed curve to the data—with or without the histogram itself.
I will discuss two of many approaches to fitting curves, one of which superimposes a nor-
mal distribution (to be discussed more extensively in the next chapter) and the other uses
what is known as a kernel density plot.
Fitting a Normal Curve
Although you have not yet read Chapter 3 you should be generally familiar with a normal
curve. It is often referred to as a bell curve and is symmetrical around the center of the dis-
tribution, tapering off on both ends. The normal distribution has a specific definition, but
Section 2.3 Fitting Smooth Lines to Data 21
20
15
10
5
0
75 80 85 90 95
Frequency
Total length (cm)
20
15
10
5
0
75 80 85 90 95 95
Frequency
Total length (cm)
Breaks at 72.5, 77.5, 82.5, etc. Breaks at 75, 80, 85, etc.
Figure 2.3 Two different histograms plotting the same data on lengths of possums
outlier