SECTION 6.4 Confidence Intervals 391
Ifb 1 , b 2 ,...,bnare the observed outcomes, i.e.,
bi =
1 if type A is observed;
0 if type B is observed,
then the relevant test statistic is
pˆ =
b 1 +b 2 +···+bn
n
.
Notice that since we don’t know p(we’re tying to estimate it), we
knowneither the mean nor the variance of the test statistic. With
a large enough sample,P̂will be approximately normally distributed
with mean p and variance p(1−p). Therefore
P̂−p
»
p(1−p)/n
will be
approximately normal with mean 0 and variance 1. The problem with
the above is all of the occurrences of the unknownp. The remedy is to
approximate the variance
p(1−p)
n
by the sample variance based on ˆp:
pˆ(1−pˆ)
n
. Therefore, we may regard
Z =
P̂−p
√̂
P(1−P̂)/n
as being approximately normally distributed with mean 0 and variance
- Having this we now build our (1−α)×100% confidence intervals
based on the valueszα/ 2 taken from normal distribution with mean 0
and variance 1. That is to say, the (1−α)×100% confidence interval
for the population proportionpis
pˆ−z
α/ 2
Ã
pˆ(1−pˆ)
n
, x+zα/ 2
Ã
pˆ(1−pˆ)
n
.
Caution: If we are trying to estimate a population parameter which
we know to be either very close to 0 or very close to 1, the method
above performs rather poorly unless the sample size is very large.