Statistical Analysis for Education and Psychology Researchers

(Jeff_L) #1

consider deleting outlier data points. The next section of output provides results of
statistical tests of model fit and parameter estimates.
The first section of SAS printout shown in Figure 8.6 is an ANOVA table for
regression which provides information on sources of variation in the response variable
and overall model fit. The first line of output identifies a model label which, by default, is
assigned MODELl. Following this the response variable, SMATHS is labelled. The
sources of variation in the response variable are partitioned into model variation
accounted for by the linear regression, and error variation that is all other sources of
variation in the response variable. The corrected total sums of squares, printed as C total
in the output, is the total variation in the response variable (the sum of model and error).
Associated degrees of freedom and sums of squares for these sources of variance are
printed in the output. These values are the same as the values computed in the worked
example. The TOTALdf is given by n−1 where n is the number of cases, MODELdf is
equivalent to the number of explanatory variables, one degree of freedom is assigned to
the intercept, and ERRORdf is given by n (number of b parameters).
The mean square (MS) values are the sums of squares divided by their appropriate
degrees of freedom. The F-test, which is a test of overall model fit, is the MSmodel/MSerror,
again this is the same F-value as that obtained in the worked example. Here the small p-
value associated with the F-statistic indicates that it is probable that MSmodel>MSerror, that
is the null hypothesis is rejected and we conclude that the least squares regression line
explains a significant proportion of the variation in the response variable. An estimate of
the proportion of this explained variation is given by the coefficient of determination
labelled R-square in the SAS output. More precisely, this statistic provides a measure of
the variation in the response variable about its mean that is explained by the linear
regression line. It is evaluated as 1−(SSerror/SStotal) which in this example is 1–0.1967 =
0.803. We can say that about 80 per cent of the sample SMATHS scores are explained by
the least squares regression model.
The coefficient of determination is equivalent to the Pearson correlation squared (r^2 )
between the explanatory and independent variable and is sometimes called r squared. In a
multiple regression the statistic R^2 is called the squared multiple correlation and it
represents the proportion of variation in the response variable explained by the linear
combination of explanatory variables. The Multiple Correlation R is the correlation
between the observed and predicted scores on the response variable. An r^2 (Pearson
correlation squared) or R^2 of zero would indicate no linear relationship and a value of 1
would suggest a perfect linear relationship. In the SAS output a value for an Adjusted R-
square is printed. This is evaluated as 1−MSerror/MStotal which in this example is
1 −0.2212=0.7787. This value is less than the unadjusted R^2 because it accounts for both
sample size, and in multiple regression the number of explanatory variables in the model.
Too much importance should not be attributed to the interpretation of R^2 without
considering the substantive meaning of the regression model. That is, does the model
make sense and are there any redundant variables? It is recommended that the adjusted R^2
be used when comparing multiple regression models because it adjusts for the number of
parameters in the regression model (independent variables).
Also contained in the ANOVA table is the Root MSE value, the square root of the
mean square error which can be interpreted as the variation of the response variable about
the fitted regression line in the population (an estimate of the parameter σ). The response


Inferences involving continuous data 279
Free download pdf