9.12 One Final Example
I want to introduce one final example because it illustrates several important points about
correlation and regression. This example is about as far away from psychology as you can
get and really belongs to physicists and astronomers, but it is a fascinating example taken
from Todman and Dugard (2007) and it makes a very important point. We have known for
over one hundred years that the distance from the sun to the planets in our solar system fol-
lows a neat pattern. The distances are shown in the following table, which includes Pluto
even though it was recently demoted. (The fact that we’ll see how neatly it fits the pattern
of the other planets might suggest that its demotion may have been rather unfair.)
If we plot these in their original units we find a very neat graph that is woefully far
from linear. The plot is shown in Figure 9.7a. I have superimposed the linear regression
line on that plot even though the relationship is clearly not linear. In Figure 9.7b, you can
see the residuals from the previous regression plotted as a function of rank, with a spline
superimposed. The residuals show you that there is obviously something going on because
they follow a very neat pattern. This pattern would suggest that the data might better be fit
with a logarithmic transformation of distance.
In the lower left of Figure 9.7, we see the logarithm of distance plotted against the rank
distance, and we should be very impressed with our choice of variable. The relationship is
very nearly linear as you can see by how closely the points stay to the regression line. How-
ever, the pattern that you see there should make you a bit nervous about declaring the rela-
tionship to be logarithmic, and this is verified by plotting the residuals from this regression
against rank distance, as has been done in the lower right. Notice that we still have a clear
pattern to the residuals. This indicates that, even though we have done a nice job of fitting
the data, there is still systematic variation in the residuals. I am told that astronomers still
do not have an explanation for the second set of residuals, but it is obvious that an explana-
tion is needed.
I have chosen this example for several reasons. First, it illustrates the difference be-
tween psychology and physics. I can’t imagine any meaningful variable that psychologists
study that has the precision of the variables in the physical sciences. In psychology you
will never see data fit as well as this. Second, this example illustrates the importance of
looking at residuals—they basically tell you where your model is going wrong. Although it
was evident in the first plot in the upper left that there was something very systematic, and
nonlinear going on, that continued to be the case when we plotted log(distance) against
rank distance. There the residuals made it clear that there was still more to be explained.
Finally, this example nicely illustrates the interaction between regression analyses and the-
ory. No one in their right mind would be likely to be excited about using regression to pre-
dictthe distance of each planet from the sun. We already know those distances. What is
important is that identifying just what that relationship is we can add to or confirm theory.
Presumably it is obvious to a physicist what it means to say that the relationship is loga-
rithmic. (I would assume it relates to the fact that gravity varies as a function of the square
of the distance, but what do I know.) But even after we explain the logarithmic relationship
we can see that there is more that needs explaining. Psychologists use regression for the
Section 9.12 One Final Example 279
Table 9.7 Distance from the sun in astronomical units
Planet Mercury Venus Earth Mars Jupiter Saturn Uranus Neptune Pluto
Rank 123456 7 89
Distance 0.39 0.72 1 1.52 5.20 9.54 19.18 30.06 39.44