Encyclopedia of Sociology

(Marcin) #1
ANALYSIS OF VARIANCE AND COVARIANCE

this study included a disproportionate number of
unhappy cohabitors or overly satisfied noncohabitors,
or both. There is always some probability that the
sample will not be representative, and the F statis-
tic utilizes probability theory (under the assump-
tion that the sample was obtained through ran-
dom selection) to assess that likelihood.


The logic behind the F statistic is that chance
fluctuations in sampling are less likely to account
for differences in sample means if the differences
are large, if the variation in outcome scores in
the population from which the sample was drawn
is small, if the sample size is large, or if all these
situations have occurred. Obviously, large mean
differences are unlikely due to chance because
they would require many more extremely
unrepresentative cases to be selected into the sam-
ple. Selecting extreme cases, however, is more
likely if there are many extremes in the popu-
lation (i.e., the variation of scores is great). Large
samples, however, reduce the likelihood of
unrepresentative samples because any extreme
cases are more likely to be counteracted by ex-
tremes in the opposite direction or by cases that
are more typical.


The general equation for computing F is as
follows:


MSBETWEEN
( 5 )
MSWITHIN
F =

The MSBETWEEN is the mean square for be-
tween-group differences. It is an adjusted version
of the SSBETWEEN and reflects the degree of differ-
ence between group means expressed as individu-
al differences in scores. An adjustment is made to
the SSBETWEEN because this value can become
artificially high by chance as a function of the
number of group comparisons being made. This
adjustment factor is called the degrees of freedom
(DFBETWEEN) and is equal to the number of group
comparisons (k−1, where k is the number of groups).
The formula for the MSBETWEEN is:


MSBETWEEN=SSBETWEEN / (k-1) ( 6 )

The larger the MSBETWEEN the greater the value of
F and the lower the probability that the sample
results were due to chance.


The MSWITHIN is the mean square for within-
group differences. This is equivalent to the SSWITHIN,


with an adjustment made for the size of the sample
minus the number of groups (DFWITHIN). Since
the SSWITHIN represents the amount of variation
in scores within each group, it is used in the F
statistic as an estimate of the amount of variation
in scores that exists in the populations from which
the sample groups were drawn. This is essentially a
measure of the potential for error in the sample
means. This potential for error is reduced, howev-
er, as a function of the sample size. The formula
for the MSWITHIN is:

MSWITHIN=SSWITHIN / (N-k) (^7 )

As can be seen in equations six and seven,
when the number of groups is high, the estimate of
variation between groups is adjusted downward to
account for the greater chance of variation. When
the number of cases is high, the estimation of
variation within groups is adjusted downward. As
a result, the larger the number of cases being
analyzed, the higher the F statistic. A high F value
reflects greater confidence that any differences in
sample means reflect differences in the popula-
tions. Using certain assumptions, the possibility
that any given F value can be obtained by chance
given the number of groups (DF1) and the number
of cases (DF2) can be calculated and compared to
the actual F value. If the chance probability is only
5 percent or less, then the null hypothesis is reject-
ed and the sample mean differences are said to be
‘‘significant’’ (i.e., not likely due to chance, but to
actual effects in the population).

ADJUSTING FOR COVARIATES

Analysis of variance can be used whenever the
predictor variable(s) has a limited number of dis-
crete categories and the outcome variable is con-
tinuous. In some cases, however, an additional
continuous predictor variable needs to be includ-
ed in the analysis or some continuous source of
extraneous effect needs to be ‘‘controlled for’’
before the group effects can be assessed. In these
cases, analysis of covariance can be used as a
simple extension of the analysis of variance model.

In the classical experimental design, the vari-
able(s) being controlled for—the covariate(s)—is
frequently some background characteristics or pre-
test scores on the outcome variable that were not
Free download pdf