Statistical Analysis for Education and Psychology Researchers

(Jeff_L) #1

unknown population regression parameter. The sample standard deviation of Y about the
regression line, S, (which is equivalent to the mean square error, that is sums of squares
for error/df error) is used to estimate the unknown population standard deviation of Y
about the regression line, σ (sigma). The population regression model and corresponding
estimated (sample) regression equation are:
y=β 0 +β 1 x 1 +ε Population model
Ŷ=b 0 +b 1 x 1 Estimated equation


where the population regression model specifies an observed value of y for a particular
value of x 1 , the explanatory variable. Sample statistics are used to estimate the
corresponding population parameters. In the estimated sample regression equation Ŷ
denotes the predicted (estimated) value of the response variable y given the value of the
explanatory variable x 1.
The procedure used to find the best fitted regression line, or least squares line is called
the method of least squares. The principle of least squares involves determination of the
regression statistics b 0 and b 1 such that errors of estimation are minimized. An error of
estimation is the difference between the observed value of y and the corresponding
predicted value, Ŷ obtained from the regression model. That is ε=Ŷ−(b 0 +b 1 x 1 ). The error
estimates in a sample are called residuals.


Why is the least squares method used?

An error of estimation (prediction) for a linear regression model may be either positive or
negative and if these errors were summed they should equal zero because the effects of
opposing signs will cancel each error. If however the sums of squared errors is evaluated
this will give a positive number. The optimal situation is when there is minimal error of
prediction and the sums of squared errors is minimized. A mathematical procedure called
differentiation allows values of the regression statistics b 0 and b 1 to be chosen which will
minimize the sums of squared errors of estimation.


Estimation and Prediction

Using a regression model, a researcher may want to estimate the intercept and slope
parameters and thereby describe the nature of the dependence between response and
explanatory variables. Once values have been estimated for the parameters they can be
used to predict the unknown value of a response variable from the known value of an
explanatory variable. However, it is recommended that values of an explanatory variable
which are beyond the sample range of the explanatory variable should not be used to
predict the value of the response variable Y. This is because the errors of prediction are
likely to be inflated.


Tests of Significance and Confidence Intervals

To test whether the linear regression model is useful for prediction we need to test
whether the explanatory variable X does in fact explain variation in the response variable
Y. If X contributes no information to the prediction of Y, the true slope of the population


Inferences involving continuous data 253
Free download pdf