Statistical Methods for Psychology

As you may remember from high school, the equation of a straight line is an equation of the form Y 5 bX 1 a. For our purposes, we will write the equation as

where 5 the predicted value of Y b 5 the slopeof the regression line (the amount of difference in associated with a one-unit difference in X) a 5 the intercept(the value of when X 5 0) X 5 the value of the predictor variable Our task will be to solve for those values of aand bthat will produce the best-fitting lin- ear function. In other words, we want to use our existing data to solve for the values of aand bsuch that the regression line (the values of for different values of X) will come as close as possible to the actual obtained values of Y. But how are we to define the phrase “best- fitting”? A logical way would be in terms of errors of prediction—that is, in terms of the (Y 2 ) deviations. Since is the value of the symptoms variable (lnSymptoms) that our equation would predictfor a given level of stress, and Yis a value that we actually obtained, (Y 2 ) is the error of prediction, usually called the residual.We want to find the line (the set of s) that minimizes such errors. We cannot just minimize the sumof the errors, how- ever, because for an infinite variety of lines—any line that goes through the point ( )— that sum will always be zero. (We will overshoot some and undershoot others.) Instead, we will look for that line that minimizes the sum of the squarederrors—that minimizes

. (Note that I said much the same thing in Chapter 2 when I was discussing the
variance. There I was discussing deviations from the mean, and here I am discussing devia-
tions from the regression line—sort of a floating or changing mean. These two concepts—
errors of prediction and variance—have much in common, as we shall see.)^5
The optimal values of aand bcan be obtained by solving for those values of aand b
that minimize g(Y 2 YN)^2. The solution is not difficult, and those who wish can find it in

g(Y 2 YN)^2

X, Y

YN

YN YN

YN

YN = bX 1 a

YN =0.009 Stress 1 4.300

254 Chapter 9 Correlation and Regression

4.2

0 1020304050 60

4.4

4.6

4.8

5.0

Stress

InSymptoms

Figure 9.2 Scatterplot of log(symptoms) as a function of stress

(^5) For those who are interested, Rousseeuw and Leroy (1987) present a good discussion of alternative criteria that
could be minimized, often to good advantage.
slope
intercept
errors of
prediction
residual

Statistical Methods for Psychology

X, Y

YN

YN

YN YN

YN

YN

YN

YN

Get our desktop app

Company

Features

Documentation

Resources