You may be thinking, why do researchers use such a circuitous procedure when testing a
hypothesis? That is why, when the question of real interest is the alternative hypothesis,
do we pretend to believe its opposite, the null hypothesis, and hope this can be refuted so
that we are then able to consider the alternate hypothesis as tenable? Put simply, why not
test the alternative hypothesis directly?
There are at least three answers to these questions. Statisticians generally are very
cautious and have over the years developed a tried-and-tested approach to making
inferences about a population of interest. They work with the idea of a null hypothesis
which describes the possible situation in the population. The accepted and proven
convention is to assume that there is no difference between two parameters and to uphold
this belief until we can provide evidence that it is no longer tenable. This indirect
approach has worked well in the past and so it is used now.
A second and more brief answer is that the statistical inferences cannot prove
anything, they can only provide evidence, in the form of probabilities, that a proposition
is not reasonable. A third answer is that if we test the alternative hypothesis directly we
would be in danger of selectivity, testing hypotheses which fit in with our thinking. That
is certain evidence would fit in with our thinking and support our selected alternative
hypothesis. In this situation negative evidence would have no effect because absence of
proof (i.e., we don’t test the hypotheses that are inconsistent with our beliefs) is not the
same as proof of absence. Moreover logic would suggest that it is virtually impossible to
prove absence of anything. The perceived wisdom is that it is better to assume absence of
proof, i.e., null hypothesis, until we have positive evidence.
p-values and Statistical Significance
Statistical tests provide probabilities or p-values for test statistics. These probabilities
indicate the likelihood that obtained results are chance differences or are significant
differences. Results are interpreted by researchers as being statistically significant when
differences between treatments or test scores are greater than would be expected by
sampling error—a difference that is not attributable to chance alone. By convention, p-
values that are less than 0.05 are generally regarded as statistically significant. A p-value
of ≤0.05 derived from a statistical test represents the chance of observing the results (or
more extreme results and consequent rejection of the null hypothesis), given that the null
hypothesis is true. Recall we test the null hypothesis and this is why it is sometimes
called the statistical hypothesis.
When we state that results are significant at p≤0.05 this implies that the conditional
probability of obtaining such results simply by chance (given that H0 is true) is less than
or equal to 1 in 20 (or 5 in 100−5 per cent). In education and psychology by convention
odds of 1 in 20 (p≤0.05) or 1 in 100 (p≤0.01) are used as the basis for rejecting a null
hypothesis.
What p-values should count as significant is up to the researcher although there are
conventions of 5 per cent and 1 per cent significance. The level of statistical significance
selected by a researcher, called the ALPHA level, a, (usually 5 per cent or 1 per cent)
should be distinguished from the p-value associated with a test statistic. This sometimes
causes confusion when statistical packages are used because they often report the actual
p-value for a statistical test rather than p≤ 0.05 or p≤0.01. The alpha level of significance
Probability and inference 109