Example 4.7 Suppose that the true proportion of smokers in a community is
known to be in the vicinity ofp¼ 0 :4, and we want to estimate it using a
sample of sizen¼100. The central limit theorem indicates that pfollows a
normal distribution with mean
mp¼ 0 : 40and variance
sp^2 ¼ð 0 : 4 Þð 0 : 6 Þ
100or standard error
sp¼ 0 : 049Suppose that we want our estimate to be correct withinG3%; it follows that
Prð 0 : 37 apa 0 : 43 Þ¼Pr0 : 37 0 : 40
0 : 049
aza0 : 43 0 : 40
0 : 049
¼Prð 0 : 61 aza 0 : 61 Þ
¼ð 2 Þð 0 : 2291 Þ
¼ 0 :4582 or approximately 46%That means if we use the proportion of smokers from a sample ofn¼100 to
estimate the true proportion of smokers, only about 46% of the time are we
correct withinG3%; this figure would be 95.5% if the sample size is raised to
n¼1000. What we learn from this example is that compared to the case of
continuous data in Example 4.2, it should take a much larger sample to have a
good estimate of a proportion such as a disease prevalence or a drug side e¤ect.
From this sampling distribution of the sample proportion, in the context of
repeated sampling, we have an approximate 95% confidence interval for a
population proportionp:
pG 1 :96SEðpÞwhere, again, the standard error of the sample proportion is calculated from
SEðpÞ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pð 1 pÞ
nrThere are no easy ways for small samples; this is applicable only to larger
ESTIMATION OF PROPORTIONS 161