Introductory Biostatistics

(Chris Devlin) #1

Example 4.7 Suppose that the true proportion of smokers in a community is
known to be in the vicinity ofp¼ 0 :4, and we want to estimate it using a
sample of sizen¼100. The central limit theorem indicates that pfollows a
normal distribution with mean


mp¼ 0 : 40

and variance


sp^2 ¼

ð 0 : 4 Þð 0 : 6 Þ
100

or standard error


sp¼ 0 : 049

Suppose that we want our estimate to be correct withinG3%; it follows that


Prð 0 : 37 apa 0 : 43 Þ¼Pr

0 : 37  0 : 40


0 : 049


aza

0 : 43  0 : 40


0 : 049





¼Prð 0 : 61 aza 0 : 61 Þ
¼ð 2 Þð 0 : 2291 Þ
¼ 0 :4582 or approximately 46%

That means if we use the proportion of smokers from a sample ofn¼100 to
estimate the true proportion of smokers, only about 46% of the time are we
correct withinG3%; this figure would be 95.5% if the sample size is raised to
n¼1000. What we learn from this example is that compared to the case of
continuous data in Example 4.2, it should take a much larger sample to have a
good estimate of a proportion such as a disease prevalence or a drug side e¤ect.


From this sampling distribution of the sample proportion, in the context of
repeated sampling, we have an approximate 95% confidence interval for a
population proportionp:


pG 1 :96SEðpÞ

where, again, the standard error of the sample proportion is calculated from


SEðpÞ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
pð 1 pÞ
n

r

There are no easy ways for small samples; this is applicable only to larger


ESTIMATION OF PROPORTIONS 161
Free download pdf