thirty observations. The corrected total degrees of freedom and the partitioning into
between and within sources should then be checked. Considering next the overall model
fit this is significant, F=15.06; df 13; 16 p<0.0001 indicating that the independent
variables have a significant effect. Notice that the model sums of squares (and df) are
equivalent to the total of the sums of squares for: (sex), (subj(sex)), (time), and
(sextime) indicating the additive nature of ANOVA. Considering the first hypothesis,
whether there is a significant main effect of sex, the observed F-statistic, obtained from
the bottom of the SAS output, is F=1.96; df1, 8; p=0.199 which indicates that the null
hypothesis cannot be rejected. It is therefore concluded that there is no difference
between males and females in their mean vocabulary scores. The reader should note that
an F-value for the effect of sex is printed in the body of the ANOVA table but this is
based on an inappropriate error term. The output at the bottom of the table reminds the
analyst that the requested denominator for the F-test has been used, ‘Tests of Hypotheses
using the Type III MS for subj(sex) as an error term’. The Type III sums of squares are
used by default but Type I, II or IV can be requested.
A significant difference is found among the three means for the pre-, post- and
delayed-post-tests, (Time in the analysis) F=22.81; df2, 16; p<0.001 suggesting there
may be a possible trend in the data. These means should be plotted but with only three
measurement points in time it probably would not be worth examining the data for trends
using polynomial terms. Finally, group by repeated measures interaction, sextime, is
significant, F=12.60; df=2, 16; p<0.001. This indicates that male and female vocabulary
scores differ depending upon the measurement occasion. Plots of the simple effects that is
the effects of one variable at individual levels of the other variable, (for example, a plot
of mean vocabulary scores, Y-axis against measurement occasion, X-axis for males and
females separately) are likely to be informative.
8.11 What Can Be Done when Assumptions of Homogeneity of
Variance and Normality of Residuals Are Not Met?
This is an issue that most researchers face at some time or another but is generally not
discussed in introductory statistical texts.
First, to deal with the two sample problem of unequal variances when the assumption
of normality is reasonable, a modified t-test should be used called the Satterthwaite
approximation procedure. This was described in the section on t-tests; for more detailed
discussion the reader is referred to Winer (1971).
When the assumption of normality is also violated then the answer is to perform a rank
transformation on the raw scores and then use the modified Satterthwaite t’-test on the
ranks of the scores instead of the scores themselves. This procedure will to a large extent
eliminate the effects of both non-normality and unequal variances. Zimmerman and
Zumbo (1993) provide a very readable account with straightforward practical guidance
about what to do in this situation.
When more than two means are compared the same principles can be applied, for
example, use of PROC GLM on the ranked scores and non-parametric procedures such as
those described in Chapter 7. Alternatively, other data transformations should be
considered, such as log transform or square root. The purpose of data transformation is
Statistical analysis for education and psychology researchers 342