reported that initial exploratory analyses, ‘eliminated several other potentially influential
independent variables...on the basis of a lack of theoretical rationale and promise, a lack
of statistical strength, measurement problems, multicollinearity with other factors already
in the model, or some combination of these’ (p. 170). Multicollinearity means highly
correlated explanatory variables, the consequence being that information specified in the
model is redundant. The authors went on to warn that, ‘In regressions with only fifty
cases and as many as thirteen potential independent variables, there are dangers of
instability in regression coefficients’ (p. 170). To check for this the authors examined the
behaviour of suspicious coefficients under different model specifications and adjusted R^2
was used rather than R^2 to make comparisons of the fit of different regression models.
Worked Example
The following data, abstracted from part of a study on referential communication skills
by the author and colleagues, is used to illustrate computational details for simple linear
regression. The two variables shown in Table 8.2 represent pupils’ standardized maths
attainment score (age adjusted), and teacher estimated score of pupils’ general maths
ability based on a rating scale of 1 (well below average) to 10 (well above average). A
simple linear regression model is fitted to these data with standardized maths score
(SMATHS) as the response variable, and teacher estimate of maths ability (MATHS) as
the explanatory variable.
Table 8.2: Teachers’ estimate of maths ability
(MATHS) and standardized maths attainment score
(SMATHS) for ten pupils
Pupil (ID) MATHS (X) SMATHS (Y)
17 5 110
18 10 133
19 5 109
20 3 114
24 8 128
27 5 109
28 8 119
29 5 119
60 1 95
61 6 118
When an investigator wants to i) find a regression relationship between a response
variable, Y, and an explanatory variable, X, or ii) find the effect of different values of xi;
on the response variable y, there are three computational steps involved. These are:
1 Compute the sums of squares for X (denoted SSXX) and for Y (denoted SSYY) and the
cross product sums of squares for XY (denoted SSXY).
2 Estimate the parameters β 0 , β 1 and σ.
3 Write out the least squares regression line substituting parameter estimates for β 0 and
β 1.
Statistical analysis for education and psychology researchers 260