The relationship between the chosen significance level alpha (usually p≤0.05 or
p≤0.001), the effect size, statistical power and sample size is complex, but essential to
understand, if an efficient study is to be planned. It is important to consider the statistical
power of any inferential tests prior to collecting data because if the power is too low then
the researcher has limited options namely:
- increase the sample size to attain adequate statistical power;
- increase alpha the probability of making a Type I error, that is the level of significance
for the test (this has the effect of reducing β because α and β are inversely related); - or in the most drastic scenario abandon the study or completely revise the design (for
example, change from an independent to a repeated measures design)
What influences the sensitivity of a design and our ability to detect a
significant difference?
There are four interrelated features of a study design that can influence the detection of
significant differences, hence the statistical power of a test: sample size; the population
variability on the measures of interest; alpha Type I error rate; and the effect size
(magnitude of difference or relationship) that we are trying to detect.
Sample size and statistical power
The effect of sample size is related to both variability of measures and statistical power
(the probability of detecting a difference should one exist). These effects can be
illustrated by considering the standard error of the mean (SEM). From the Central Limit
Theorem, the population variance of a sampling distribution of means is normally
distributed with mean μ, and a variance of σ^2 /n (standard deviation is —usually
called the standard error, in this example it is the SEM). When computing many test
statistics the denominator is usually a standard error, for example, in computing the t-
statistic (independent) it is evaluated as a ratio of the difference between two sample
means divided by the standard error of the difference between the sample means.
You can think about the standard error of the difference between two means as
representing, under the null hypothesis, the variability expected in the differences
between the means of pairs of samples drawn from a single population. If a t-test is
performed and the calculated t-statistic or more correctly the obtained t-ratio is
sufficiently large when looked up in a table of t-values (with appropriate df), then
statistical significance is attained (at a specified α). The two sample means are said to be
significantly different. The importance of sample size is most noticeable if we think about
the denominator in the t-ratio, the standard error of the difference between two means. As
the sample size, n, increases, then the value of the standard error decreases. This is easily
shown if you consider the SEM of a single mean, i.e. If you divide by a larger
number, the SEM is reduced. The same principle applies to the standard error of the
difference between two means as evaluated in the t-ratio. The standard error of the
difference forms the denominator in the t-ratio and hence a smaller value increases the
size of t. Larger t-values increase the chance of attaining statistical significance for a
given magnitude of effect (effect size). Larger sample sizes yield larger degrees of
Statistical analysis for education and psychology researchers 132