In simple linear regression analysis, a random sample of observations is selected from
a defined population of interest, and data consists of quantitative (continuous)
measurements on a response variable and usually qualitative measures on an explanatory
variable (sometimes called independent variable). Often in educational research
regression analysis is used with survey data as opposed to data generated from
experimental designs. Regression is sometimes seen as being a completely different
analytic technique and unrelated to that of analysis of variance, this is possibly because
these techniques arose in different research traditions. In fact both techniques are based
on the General Linear Model (GLM). In its simplest form the GLM says that a response
variable is related to an independent variable(s) by a weighting factor and that the
response variable is given by the sum of all weighted independent variables. The term
general linear model means a general statistical model that is linear in its parameters.
That is the parameters, one for each independent variable, are summed. Regression and
analysis of variance are simply different variants of the same General Linear Model and
different disciplines have traditionally favoured certain research approaches and analytic
techniques.
Educational research, for example, has a strong survey tradition and has relied more
heavily on correlation and regression techniques. In the language of regression, the
response variable is related to the weighted sum of independent variables. The weighting
factor is called a regression weight (coefficient) and the influence of a weighted
explanatory variable on the response variable is referred to as a regression effect or
simply in terms of whether a regression coefficient is statistically significant.
In contrast, psychology has a strong experimental tradition and associated with this are
ANOVA techniques. In the language of analysis of variance, the independent variable is
a categorical variable, and we are interested in treatment effects and tests of significance.
The weighted independent variables which depend upon treatment combinations
represent treatment effects.
Random error, generally defined as the difference between observed scores and those
predicted from a statistical model in the regression framework, is estimated by the
difference between observed scores and those predicted from the fitted regression line,
whereas in an ANOVA statistical model error is estimated as the difference between
observed scores and cell means (treatment combination means). It is important to
consider regression and analysis of variance in the context of general linear models
because they are treated in a uniform way in many propriety statistical computer
packages. When using propriety statistical analysis programmes such as SPSS or SAS
interpretation of statistical output is much easier to understand if terms such as model
sums of squares, error sums of squares, mean square error, r-square, parameter
estimates and regression weights are seen to be derived from a unified general linear
model.
When an investigator is interested in predicting the value of a response variable (either
mean predicted values for subgroups or individual predicted values) given the value of
another explanatory variable and has a random sample of pairs of observations (X,Y)
which have continuous measurements, and when it is reasonable to assume a linear
relationship between X and Y, then simple linear regression should be considered as a
possible analytic approach. There are additional assumptions which would need to be met
Inferences involving continuous data 251