Basic Statistics

(Barry) #1

172 REGRESSION AND CORRELATION


I X

Figure 12.3 Position of regression line for bivariate normal distribution.

because both the slope and the intercept are used in computing the line. The square
root of the residual mean square is often called the standard error of estimate.
The residual mean square measures the variation of the Y observations around
the regression line. If it is large, the vertical distances from the line are large. If it
is small, the vertical distances are small. In the example using the 10 data points,
s;,~ = 27.49. The variance of the original 10 Y values can be computed from
Table 12.2 as 828.90/9 = 92.1, so the variance about the regression line is much
smaller than the variance of Y in this example. This indicates that Y values tend to be
closer to Y than they are toy. Thus, using the least-squares regression line to predict
systolic blood pressure from weight gives us a closer prediction than does using the
sample mean systolic blood pressure.

12.2.6 Model Underlying Single-Sample Linear Regression

Up to this point in this chapter, we have discussed computing a regression line from
a single sample. To compute confidence limits and to make tests of hypotheses, we
shall make the basic assumption that we have taken a simple random sample from a
population where X and Y follow a bivariate normal distribution. In Chapter 6 we
discussed a normal distribution for a single variable X. For X and Y to be bivariately
normal requires not only that both X and Y be individually normally distributed but
also that the relationship between X and Y has a bivariate normal distribution (see
van Belle et al. [2004]).
One way of determining whether the data points may be from a bivariate nor-
mal distribution is to examine the scatter diagram. If the variables have a bivariate
normal distribution, data points should lie approximately within an ellipse. An el-
lipse is depicted in Figure 12.3. Also, if the data points (X, Y) follow a bivariate
normal distribution, both X and Y separately should be normally distributed. Nor-
mal probability plots can be used to check this. The residuals should be normally
Free download pdf