earlier editions of this book or in Draper and Smith (1981, p. 13). The solution to the problem
yields what are often called the normal equations:
We now have equations for aand b^6 that will minimize. To indicate that our
solution was designed to minimize errors in predicting Yfrom X(rather than the other way
around), the constants are sometimes denoted and. When no confusion would
arise, the subscripts are usually omitted. (When your purpose is to predict Xon the basis of
Y[i.e., XonY], then you can simply reverse Xand Yin the previous equations.)
As an example of the calculation of regression coefficients, consider the data in Table 9.2.
From that table we know that and We also know that
Thus,
We have already seen the scatter diagram with the regression line for Yon Xsuperimposed
in Figure 9.2. This is the equation of that line.^7
A word about actually plotting the regression line is in order here. To plot the line, you
can simply take any two values of X(preferably at opposite ends of the scale), calculate
for each, mark these coordinates on the figure, and connect them with a straight line. For
our data, we have
When
and when
The line then passes through the points (X 5 0, Y 5 4.300) and (X 5 50, Y 5 4.730), as
shown in Figure 9.2. The regression line will also pass through the points (0, a) and ( ),
which provides a quick check on accuracy.
If you calculate both regression lines (Yon Xand Xon Y), it will be apparent that the
two are not coincident. They do intersect at the point ( ), but they have different slopes.
The fact that they are different lines reflects the fact that they were designed for different
purposes—one minimizes and the other minimizes. They both go
through the point ( , ) because a person who is averageon one variable would be ex-
pected to be averageon the other, but only when the correlation between the two variables
is 6 1.00 will the lines be coincident.
X Y
g(Y 2 YN)^2 g(X 2 XN)^2
X, Y
X, Y
YNi=(0.0086)(50) 1 4.300=4.730
Xi=50,
YNi=(0.0086)(0) 1 4.300=4.300
Xi=0,
YNi=(0.0086)(Xi) 1 4.300
YN
YN =bX 1 a=(0.0086)(X) 1 4.300
a=Y 2 bX=4.483 2 (0.0086)(21.290)=4.300
b=
covXY
s^2 X
=
1.336
12.492^2
=0.0086
covXY=1.336.
X=21.290, Y=4.483, sX=12.492.
aY#X bY#X
g(Y 2 YN)^2
b=
covXY
s^2 X
a=Y 2 bX
Section 9.5 The Regression Line 255
(^6) An interesting alternative formula for bcan be written as. This shows explicitly the relationship be-
tween the correlation coefficient and the slope of the regression line. Note that when , bwill equal r. (This
will happen when both variables have a standard deviation of 1, which occurs when the variables are standardized.)
(^7) An excellent Java applet that allows you to enter individual data points and see their effect on the regression line
is available at http://www.math.csusb.edu/faculty/stanton/m262/regress/regress.html.
sY=sX
b=r(sY>sX)
normal equations