input power alpha diff sd n;
cards;
−9 0.001 0.6 3.676 828
0.80 0.05 0.8 20.715 − 9
;
Figure 5.7: SAS code for POWER3 programme to
determine power and sample size for the related
groups design (Example 5.5)
Comparison of Two Means (paired data)
(^) Finding the power
alpha diff sd n calculated value of power
.001 0.6 3.676 828 0.92
Comparison of Two Means (paired data)
Finding number of subjects (n)
power alpha diff sd calculated value of n
0.8 0.05 0.8 20.715 5263
Figure 5.8: Estimated power for difference between
means (related measures), output from SAS
programme POWER3
Notice in Figure 5.7, in the first line of data input power is set to −9. This is evaluated
as 92 per cent power and is shown in Figure 5.8. In the second line of data input in Figure
5.7 sample size is set to −9. This is evaluated as a required sample size of 5263, and
shown in Figure 5.8.
In reporting the results the authors comment, ‘The paired sample t-tests suggested that
the WSADSW programme was effective in bringing about an improvement in teacher
stress, discipline policies...but that it made no difference to the mean levels of student
misbehaviour’ (p. 34). From Figure 5.8 it is evident that the obtained power for the
teachers’ psychological distress variable exceeds the 80 per cent level. The researchers
therefore have a 90 per cent chance of detecting a difference as small as the one reported.
To detect a difference of 0.8 in teachers’ perceptions of student misbehaviour with 80 per
cent power and an alpha of 0.05, a sample size of 5263 would be required. It is not
surprising that the authors report there is no significant change at the 0.10 level on this
variable.
We should pause for a moment to reflect on the finding of this power analysis. Why
should we require such a large sample size? On reading the full article you might notice
that the variance on the student misbehaviour variable is more than 4 times larger than
the variance on any of the other measures reported by the authors. Recall from section 5.4
that the more homogeneous the groups are (less variability), the easier it is to detect
differences. With such large variability in this measure, to detect a relatively small
difference a large sample is required.
Considering interpretation of these findings there certainly appears to be some change
Choosing a statistical test 141