Statistical Methods for Psychology

(Michael S) #1
would replace the two lowest values (3 and 7) by 12s and the two highest values (36 and
50) by 33s, leaving

12 12 12 15 17 17 18 19 19 19
20 22 24 26 30 32 32 33 33 33

[The variance and any test statistics calculated on this sample would be based on
(N 212 4) df, because we trimmed off four values and replaced them with pseudovalues,
and it is not really fair to pretend that those pseudovalues are real data.] Experiments
with samples containing an unusual number of outliers may profit from trimming and/or
“Winsorizing.” When you run an analysis of variance on trimmed data, however, you
should base the on the variance of the corresponding Winsorized sample and not
on the variance of the trimmed sample. A fairly readable study of the effect of applying
t tests (and, by extension, the analysis of variance) to trimmed samples was conducted
by Yuen and Dixon (1973): you should read it before running such analyses. You should
also look at papers by Wilcox (1993 and 1995). A useful reference when we come to
multiple comparisons in Chapter 12 is Keselman, Holland, and Cribbie (2005, pp.
1918–1919).

When to Transform and How to Choose a Transformation


You should not get the impression that transformations should be applied routinely to all of
your data. As a rule of thumb, “If it’s not broken, don’t fix it.” If your data are reasonably
distributed (i.e., are more or less symmetrical and have few if any outliers) and if your vari-
ances are reasonably homogeneous, there is probably nothing to be gained by applying a
transformation. If you have markedly skewed data or heterogeneous variances, however,
some form of transformation may be useful. Furthermore, it is perfectly legitimate to shop
around for a transformation that makes the necessary changes to the variance or shape. If a
logarithmic transformation does not do what you want (stabilize the variances or improve
shape), then consider the square-root (or cubed-root) transformation. If you have near-zero
values and does not work, try. The only thing that
you should notdo is to try out every transformation, looking for one that gives you a sig-
nificant result. (You are trying to optimize the data, not the resulting F.) Finally, if you are
considering using transformations, it would be a good idea to look at Tukey (1977) or
Hoaglin, Mosteller, and Tukey (1983).

Resampling


An old but very valuable approach to statistical hypothesis testing that is beginning to win
many more adherents is known as “resampling statistics.” I say a great deal about this ap-
proach in Chapter 18, but before leaving methods for dealing with violations of assump-
tions, I should at least mention that resampling methods offer the opportunity to avoid
some of the assumptions required in the analysis of variance. These methods essentially
create a population that exactly resembles the distribution of obtained data. Then the com-
puter creates samples by drawing randomly, without replacement, from this population as
if the null hypothesis were true, and calculates a test statistic, such as F,for that sample.
This process is then repeated a very large number of times, producing a whole distribution
of Fvalues that would be expected with a true null hypothesis. It is then simple to calcu-
late how many of these Fs were more extreme than the one from your data, and reject, or
fail to reject, depending on the result. Students interested in this approach can jump to

Y= 2 X 1 0.5 Y= 2 X 12 X 11


MSerror

342 Chapter 11 Simple Analysis of Variance


resampling
statistics

Free download pdf