normally distributed. This not true. It is only necessary that, in the case of the paired t-
test, the difference scores are normally distributed, and in the case of linear regression it
is the residuals (difference between observed and predicted scores, i.e., errors) after
fitting the explanatory variable that should be normally distributed. The assumption of
normality refers to the population of interest and not just the sample of scores. Therefore,
in the above examples what is meant is that the difference scores and the residuals in the
population of interest are normally distributed. The assumptions of normality are
usually based either on faith or as Siegel and Castallan (1988) put it, ‘rest on conjecture
and hope’ (p. 35). Generally, when results are reported in journals the normality
assumptions (and other assumptions, such as independence of observations, homogeneity
of variance) are simply assumed to hold and are seldom tested and reported. When space
is at a premium, brief details about the validity of underlying assumptions would greatly
enhance the trustworthiness of conclusions.
In this chapter, the general format of previous chapters is followed but the section on
test assumptions is extended to include details of how to check assumptions. An overview
of the general ideas and statistical models underlying regression and analysis of variance
(ANOVA) is presented before each of these procedures is illustrated. Discussion about
what can be done when various parametric assumptions are not met is presented at the
end of this chapter.
8.1 Introduction to Regression and Correlation
In educational research regression analysis is one of the most widely used statistical
procedures. It should be considered when interest is centred on the dependence of a
response variable on an explanatory variable(s). For example, a primary school
headteacher may want to know whether a class teacher’s estimate of a pupil’s maths
ability will predict that pupil’s maths score on a standardized test of maths ability.
Regression analysis can be used to: Describe the relationship between the response
variable (score on a standardized maths test in the above example) and an explanatory
variable (teacher’s estimate of pupil ability in the above example) and Predict the values
of a response variable from explanatory (independent) variables. When there is a linear
relationship between response and explanatory variables and when there is only one
explanatory variable and one response variable we refer to this as a simple linear
regression. When there is one response variable but more than one explanatory variable
this is referred to as a multiple regression analysis. We use the term multivariate
regression when we have more than one response variable and any number of
explanatory variables.
Correlation analysis is when a measure of the linear relationship between two or more
random variables is estimated. The parametric correlation statistic is the Pearson product
moment correlation. This is a quantitative index of the strength of the linear relationship
between two variables. If a researcher wants to determine the strength of relationship
between two variables then correlation analysis is appropriate, however, if interest is
centred on how well a least squares model fits the data or on prediction of one variable
given values on another variable(s) then regression is the appropriate analytic technique.
Statistical analysis for education and psychology researchers 250