conditional probability distribution, it is assumed that this will not affect
the chance that some other observation in the sample will lie below the
mean of its conditional probability distribution. Obviously, this assump-
tion need not be true.
Fourth, regression analysis assumes that the conditional probability dis-
tribution of Yis normal. Actually, not all aspects of regression analysis re-
quire this assumption, but some do. It is also worth noting that in regression
analysis only Yis regarded as a random variable. The values of Xare as-
sumed to be fixed. Thus, when regression analysis is used to estimate Yon the
basis of X, the true value of Yis subject to error, but the value of Xis known.
The four assumptions underlying regression analysis can be stated
somewhat differently. Together they imply that the error term is normally
distributed. Yiis the ith observed value of the dependent variable, Xiis the
ith observed value of the independent variable, and eiis a normally distrib-
uted random variable with a mean of zero and a standard deviation equal
to σe. Essentially, eiis an error term—that is, a random variable. Because of
the presence of this error term, the observed values of Yifall around the
population regression line, not on it. (See Figure 5.3.)
The case of simple, linear, least squares regression may be written in
the form:
Y= α+ βX+ ε (5.1)
where Y, the dependent variable, is a linear function of X, the indepen-
dent variable. The parameters αand βcharacterize the population regres-
sion line and ε is the randomly distributed error term. The regression
estimates of αand βwill be derived from the principle of least squares. In
applying least squares, the sum of the squared regression errors will be
minimized; our regression errors equal the actual dependent variable mi-
nus the estimated value from the regression line. If Yrepresents the actual
value and Ythe estimated value, their difference is the error term, e. Least
squares regression minimizes the sum of the squared error terms. The sim-
ple regression line will yield an estimated value of Y, , by the use of the
sample regression:
(5.2)
In the estimation equation (5.2) a^is the least squares estimate of αand b
^
is
the estimate of β. Thus, αand βare the regression constants that must be
estimated. The least squares regression constants (or statistics) aand bare
unbiased and efficient (smallest variance) estimators of αand β. The error
YabXˆ=+ˆ ˆ
Yˆ