variable mean is printed as well as the coefficient of variation, CV which is a unitless
measure of variation evaluated as (RMSE/ mean of the response variable)×100.
The next section of output contains the parameter estimates. In Figure 8.6 there are
two parameter estimates, the intercept and MATHS. Point estimates for the parameters
and associated standard errors are also printed, these values corres-pond with the values
in the worked example (see formulas 8.5 and 8.6). Interpretation of these parameter
estimates is also the same. The t-test for the hypothesis of zero regression slope is
printed. The t-value, evaluated as the ratio of the parameter estimate to its standard error
is 5.717, which has an associated p-value of 0.0004. This indicates that the variable
MATHS makes a significant contribution to the prediction of the response variable
SMATHS. If the parameter estimate is more than twice its standard error this suggests
that the explanatory variable is significant at the 5 per cent level.
The last section of output in Figure 8.6 contains a printout of predicted values and
residuals scores for the fitted model as well as statistics for regression diagnostics. These
latter diagnostic statistics are not particularly informative. Student Residuals, evaluated as
residual/SEresidual, are printed and shown in a schematic plot. They are interpreted as a t-
like statistic. Two diagnostic statistics are also printed, Cook’s D a kind of influence
statistic and a Press statistic. The former indicates what effect a particular observation has
on the regression when an observation is deleted. Look at the change in the predicted
value. The Press statistic (the sum of the squared press residuals) is interpreted in a
similar way. For both statistics, the smaller they are the better. Finally, the sum of
squared residuals is printed; notice this is the same as the error sums of squares in the
ANOVA part of the output.
8.3 Pearson’s Correlation r
When to Use
The Pearson product moment correlation, r, may be used as a sample estimate of the
population correlation, ρ (rho). It is a dimensionless index of the linear relationship
between two random variables, a value of zero means that there is no linear relationship
between the variables and a score of one indicates a perfect linear relationship. If a
correlation is negative it means that high values on one variable are associated with low
values on the other variable. Values of r may vary between −1 and +1 irrespective of the
dimensions of measurement of two variables (assuming they approximate at least an
interval level of measurement). Thus the correlation between age, measured in months,
and a teacher’s estimate of maths ability, measured on a 10-point scale, would be
unaffected by the units of measurement (but may well be affected by the different
measurement ranges of each variable—p. 285). A partial correlation is an index of the
relationship between two variables while partialling out the effect (holding constant) of a
third variable.
The Pearson correlation, r, should be considered as a descriptive statistic when a
researcher wants to quantify the extent of linear relationship between variables. A
parametric correlation would be appropriate whenever quantitative measures are taken
simultaneously on two or more variables, the relationship between the two variables is
Statistical analysis for education and psychology researchers 280