Example 4.7 Suppose that the true proportion of smokers in a community is
known to be in the vicinity ofp¼ 0 :4, and we want to estimate it using a
sample of sizen¼100. The central limit theorem indicates that pfollows a
normal distribution with mean
mp¼ 0 : 40
and variance
sp^2 ¼
ð 0 : 4 Þð 0 : 6 Þ
100
or standard error
sp¼ 0 : 049
Suppose that we want our estimate to be correct withinG3%; it follows that
Prð 0 : 37 apa 0 : 43 Þ¼Pr
0 : 37 0 : 40
0 : 049
aza
0 : 43 0 : 40
0 : 049
¼Prð 0 : 61 aza 0 : 61 Þ
¼ð 2 Þð 0 : 2291 Þ
¼ 0 :4582 or approximately 46%
That means if we use the proportion of smokers from a sample ofn¼100 to
estimate the true proportion of smokers, only about 46% of the time are we
correct withinG3%; this figure would be 95.5% if the sample size is raised to
n¼1000. What we learn from this example is that compared to the case of
continuous data in Example 4.2, it should take a much larger sample to have a
good estimate of a proportion such as a disease prevalence or a drug side e¤ect.
From this sampling distribution of the sample proportion, in the context of
repeated sampling, we have an approximate 95% confidence interval for a
population proportionp:
pG 1 :96SEðpÞ
where, again, the standard error of the sample proportion is calculated from
SEðpÞ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pð 1 pÞ
n
r
There are no easy ways for small samples; this is applicable only to larger
ESTIMATION OF PROPORTIONS 161