Inferential Statistics 383
PII(δ(X)) ≤ 1 – α for any θ ∈ Θ 1
So if a test is unbiased, we reject the null hypothesis when it is in fact false
in at least α of the cases. Consequently, the power of this test is at least α for
all θ ∈ Θ 1. The probability of rejecting the null hypothesis falls below the
significance level α for the highest parameter values.
Consistent Test Up to this point, the requirement of a good test is to produce
as few errors as possible. We attempt to produce this ability by first limiting
its test size by some level α and then looking for the highest power available
given that significance level α.
By construction, each of our tests δ(X) is based on some test statistic
t(X). For this test statistic, we construct an acceptance as well as a critical
region such that, given certain parameter values, the test statistic would fall
into either one of these critical regions with limited probability. It may be
possible that the behavior of these test statistics changes as we increase the
sample size n. For example, it may be desirable to have a test of size α that
has vanishing probability for a type II error.
From now on, we will consider certain tests that are based on test sta-
tistics that fall into their respective critical regions ΔC with increasing prob-
ability, under the alternative hypothesis, as the number of sample drawings
n tends to infinity. That is, these tests reject the null hypothesis more and
more reliably when they actually should (i.e., θ ∈ Θ 1 ) for ever larger samples.
In the optimal situation, these tests reject the null hypothesis (i.e., δ(X) = d 1 )
with 100% certainty when the alternative hypothesis holds. This brings us
to the next definition.
Consistent test. A test of size α is consistent if its power grows to one
for increasing sample size.
Recall that in our coverage of point estimates we introduced the consistent
estimator that had the positive feature that it varied about its expected value
with vanishing probability. So, with increasing probability, it assumed values
arbitrarily close to this expected value such that eventually it would become
virtually indistinguishable from it. The use of such a statistic for the test
leads to the following desirable characteristic: The test statistic will cease to
assume values that are extreme under the respective hypothesis such that it
will basically always end up in the acceptance region when the null hypoth-
esis holds, and in the rejection region under the alternative hypothesis.