not a measure of any difference, between the two occasions. The null hypothesis tested
by the sign test is that the median difference between two sets of scores is zero. Although
theoretically the response variable should be continuous, because only the sign of any
difference is used, the test can be treated as a binomial procedure hence its location in
Figure 5.1.
The Wilcoxon signed ranks test is a more powerful repeated measures test than the
sign test because it uses more information (more of the data), that is the ranked positions
of individual scores, rather than just the medians of the distributions. Similar to the t-test
it is a test of no difference. The null hypothesis is stated in terms of the sum of the
positive ranks equals the sum of the negative ranks.
The two-sample Proportions test is a convenient test for the difference between two
proportions or percentages. It is based on the binomial approximation to the normal
distribution so a minimum combined sample size should be about 40 with a minimum of
20 in each group (two-sample test). The normal approximation is also less accurate as the
proportion P in each group moves away from 0.5. The proportions test is much underused
in educational research. A one-sample proportions test can be used when we want to
make an inference for a single proportion—an unknown population proportion can be
estimated from a sample proportion.
The Binomial test, similar to the sign and proportions tests, uses binomial data.
Unlike these two tests, it is a single sample test, and one binomial population is classified
into two groups. When this test is used, the two proportions (or percentages) should add
up to 1 or 100 per cent (the total sample size). The binomial test is useful when we want
to determine whether observed proportions—yes/no, male/female, etc.—differ from what
would be expected by chance. When data is in a 2×2 contingency table and cell
frequencies are small (<5), Fisher’s extact test should be considered.
Differences between three or more samples
When more than two samples are to be compared and the response variable is distributed
normally then an ANOVA (Analysis of variance) type analysis should be considered in
preference to a series of t-tests. Multiple tests on the same sample increases ‘experiment-
wise’ error. The most common multiple sample comparison procedure (sometimes called
multiple group comparison procedure) is the F-test. This is a parametric procedure with
similar requirements to the t-test. The null hypothesis tested with the F-test is that the
group means are equal, i.e., H 0 : μ 1 =μ 2 =μ 3 ...=μn. Similar to the t-test there is a repeated
measures ANOVA which also uses the F-test. In the related ANOVA unlike the
independent ANOVA, variance in scores due to individual subjects can be treated as a
separate source of error. This confers the same advantage that the repeated t-test has over
the independent t-test. Nonparametric equivalents of the F-test are the Kruskal-Wallis
one way ANOVA for independent samples, and the Friedman’s ANOVA by ranks
procedure for related measures. Both the Kruskal-Wallis and the Friedman procedures
test the null hypothesis that the samples (or repeated measures) come from populations
all with the same median, effectively one population. Both procedures require data to be
ordinal (ranked).
When three or more samples are to be compared and data is in the form of counts
(frequencies) then two parametric procedures should be considered. If the samples (three
Statistical analysis for education and psychology researchers 124