Basic Statistics

(Barry) #1
LINEAR REGRESSION: SINGLE SAMPLE 171

120 4
140 160 180 200 220 240 260
Weight (Ib)

Figure 12.2 Least-squares regression line, Y = 80.74 + .2903X.


from the regression line. For the example with the 10 adult males, we would have 10
residuals. Each residual indicates whether the systolic blood pressure is high or low
considering the individual’s weight.
If the 10 values of Y - Y are squared and added together, the sum C(Y - Y)2 is
calculated to be 219.94. This sum is smaller for the line Y = 80.74 + .29X than it
would be for any other straight line we could draw to fit the data. For this reason we
call the line the least-squares line. The regression line is considered to be the best-
fitting line to the data in the sense that the sum of squares in the vertical direction is
as small as possible.
The sum of the 10 values of Y - Y is approximately 0. If there were no rounding
off in the calculations, it would be precisely 0. The sum of vertical distances from
the regression line is always 0. Also, the sum of the Y’s always equals the sum of
Y’s (within rounding error).


12.2.5 The Variance of the Residuals


The 10 residuals Y - Y are simply a set of 10 numbers, so we can calculate their
variance. This variance is called the residual mean square and its formula is


C(Y - Y)2
s;.. = n-2

For the 10 data points, s;,. = 219.94/(10 - 2) = 27.49, and taking the square root
we obtain = 5.24 mmHg for the standard deviation of the residuals. An obvious
difference between the variance of the residuals and the variances of X and Y that we
obtained earlier is that here we divided by n - 2 instead of n - 1. The n - 2 is used

Free download pdf