Statistical Methods for Psychology

(Michael S) #1
that the score could have been 73 or 86, but it is not at all likely that the score would have
been 20 or 100. In other words there is a distribution of alternative possibilities around any
obtained value, and this is true for all obtained values. We will use this fact to produce an
overall curve that usually fits the data quite well.
Kernel estimates can be illustrated graphically by taking an example from Everitt and
Hothorn (2006). They used a very simple set of data with the following values for the
dependent variable (X).
X 0.0 1.0 1.1 1.5 1.9 2.8 2.9 3.5

If you plot these points along the Xaxis and superimpose small distributions represent-
ing alternative values that might have been obtained instead of the actual values you have,
you obtain the distribution shown in Figure 2.5a. Everitt and Hothorn refer to these small
distributions by a technical name: “bumps.” Notice that these bumps are normal distribu-
tions, but I could have specified some other shape if I thought that a normal distribution
was inappropriate.
Now we will literally sum these bumps vertically. For example, suppose that we name
each bump by the score over which it is centered. Above a value of 3.8 on the X-axis you
have a small amount of bump_2.8, a little bit more of bump_2.9, and a good bit of
bump_3.5. You can add heights of these three bumps at X 5 3.8 to get the kernel density of
the overall curve at that position. You can do the same for every other value of X. If you do
so you find the distribution plotted in Figure 2.5b. Above the bumps we have a squiggly
distribution (to use another technical term) that represents our best guess of the distribution
underlying the data that we began with.
Now we can go back to the reaction time data and superimpose the kernel density func-
tion on that histogram. (I am leaving off the bumps as there are too many of them to be leg-
ible.) This resulting plot is shown in Figure 2.6. Notice that this curve does a much better
job of representing the data than did the superimposed normal distribution. In particular it
fits the tails of the distribution quite well.
Version 16 of SPSS fits kernel density plots using syntax, and you can fit them using
SAS and S-Plus (or its close cousin R). It is fairly easy to find examples for those programs
on the Internet. As psychology expands into more areas, and particularly into the

Section 2.3 Fitting Smooth Lines to Data 23

2.0

1.5
Y(

X
)

X

1.0

0.5

2.5

0
–1 01234

2.0

1.5
Y(

X
)

X

1.0

0.5

2.5

0
–1 01234

Figures 2.5a and 2.5b Illustration of the kernel density function for X

Free download pdf