Basic Statistics

(Barry) #1
LINEAR REGRESSION: SINGLE SAMPLE 173

distributed; many computer programs will allow the user to check the distribution of
the residuals using normal probability plots or other visual methods.
If we enclose the points in Figure 12.1 with an ellipse, it would be a long and thin
ellipse, thus indicating a strong relationship between X and Y. An ellipse closer to
a circle would be an indication of a weak relationship between X and Y.
Figure 12.3 also shows the position of the least-squares regression line. Note that
in Figure 12.3 the regression line is not the major (or principal) axis of the ellipse
but tends to have a smaller slope than the major axis. This is the case whenever the
relationship is positive. If the relationship is negative (negative slope coefficient), the
slope of the regression line will be closer to 0 than the major axis of the ellipse. In
both cases the regression line is more nearly horizontal than the major axis of the
ellipse. The least-squares line goes through the center of the ellipse and touches the
ellipse at the points where the two vertical lines touch the ellipse (are tangent to the
ellipse).
As an aside, the fact that the slope coefficient will be smaller than that of the
major axis of the ellipse when the relationship is positive has special importance in
interpreting data taken from paired samples. For example, suppose that cholesterol
levels were taken prior to treatment and 1 month later. We could make a scatter plot
of the data with the pretreatment cholesterol level on the X axis and the posttreatment
level on the Y axis. If there was no change except for that due to measurement error
and day-to-day variation in levels at the two time periods, we would expect to see
the points in a scatter diagram falling roughly within an ellipse with the major axis
having a 45" slope (slope of 1) and going through the origin. The pretreatment and
posttreatment means would be roughly equal. Note that since we will fit a least-
squares line, the regression line will probably have a slope < 1, possibly quite a
bit less if there is a great deal of variation in the results. If all the points fall very
close to a straight line, there is very little difference between the slope of the major
axis and that obtained from the least-squares regression line. Note that the same
considerations would come into play if we wished to compare the readings made by
two different observers or to compare two different laboratory methods for measuring
some constituent of blood.
If the assumption of a simple random sample from a bivariate normal population
can be made, we can use our sample statistics to estimate population parameters.
The population parameters that can be estimated and the sample statistics used to
estimate them are given in Table 12.3. Note from the table that cy and B are used
in linear regression to identify the intercept and slope coefficient for the population
regression line. This follows the convention of using Greek letters for population
parameters. In Section 8.4, LV was used to denote the chance of making a type I error
and was used to denote the chance of making a type I1 error. To avoid confusion in
this chapter, we always say population intercept and population slope when referring
to the population regression line.
In earlier chapters we have covered the estimation of px. py , o:, and 0; when
variables are considered one at a time. In the following sections we discuss confidence
intervals and tests of hypotheses concerning the remaining estimators.

Free download pdf