16.4 Analysis of Variance with Unequal Sample Sizes
The least-squares approach to the analysis of variance is particularly useful for the case of fac-
torial experiments with unequal sample sizes. However, special care must be used in selecting
the particular restricted models that are employed in generating the various sums of squares.
Several different models could underlie an analysis of variance. Although in the case of
equal sample sizes these models all lead to the same results, in the unequal ncase they do
not. This is because with unequal ns, the row, column, and interaction effects are no longer
orthogonal and thus account for overlapping portions of the variance. (I would strongly
recommend quickly reviewing the example given in Chapter 13, Section 13.11,
pp. 444– 446.) Consider the Venn diagram in Figure 16.1. The area enclosed by the
surrounding square will be taken to represent SStotal. Each circle represents the variation
attributable to (or accounted for by) one of the effects. The area outside the circles but
within the square represents SSerror. Finally, the total area enclosed by the circles represents
, which is the sum of squares for regression when all the terms are included
in the model. If we had equal sample sizes, none of the circles would overlap, and each
effect would be accounting for a separate, independent, portion of the variation. In that
case, the decrease in SSregressionresulting from deleting of an effect from the model would
have a clear interpretation—it would be the area enclosed by the omitted circle and thus
would be the sum of squares for the corresponding effect.
But what do we do when the circles overlap? If we were to take a model that included
terms for A, B, and ABand compared it to a model containing only Aand Bterms, the decre-
ment would not represent the area of the ABcircle, since some of that area still would be
accounted for by Aand/or B. Thus, SSAB, which we calculate as ,
represents only the portion of the enclosed area that is uniqueto AB—the area labeled with
a “3.” So far, all the models that have been seriously proposed are in agreement. SSABis that
portion of the ABcircle remaining after adjusting for Aand B.
But now things begin to get a little sticky. Two major approaches have been put forth
that differ in the way the remainder of the pie is allotted to Aand B. Overall and Spiegel
(1969), put forth three models for the analysis of variance, and these models continue to
generate a voluminous literature debating their proper use and interpretation, even though
the discussion began 30 years ago. We will refer to these models as Type I, Type II, and
Type III, from the terminology used by SPSS and SAS. (Overall and Spiegel numbered
them in the reverse order, just to make things more confusing.) Basically, the choice be-
tween the three models hinges on how we see the relationship between the sample size and
the treatments themselves, or, more specifically, how we want to weight the various cell
SSregressiona,b,ab 2 SSregressiona,b
SSregressiona,b,ab
Section 16.4 Analysis of Variance with Unequal Sample Sizes 593
Figure 16.1 Venn diagram representing portions of overall variation
7
6
4
3
1 2
A B
AB
5