Introductory Biostatistics

Example 4.7 Suppose that the true proportion of smokers in a community is
known to be in the vicinity ofp¼ 0 :4, and we want to estimate it using a
sample of sizen¼100. The central limit theorem indicates that pfollows a
normal distribution with mean

mp¼ 0 : 40

and variance

sp^2 ¼

ð 0 : 4 Þð 0 : 6 Þ 100

or standard error

sp¼ 0 : 049

Suppose that we want our estimate to be correct withinG3%; it follows that

Prð 0 : 37 apa 0 : 43 Þ¼Pr

0 : 37 0 : 40

0 : 049

aza

0 : 43 0 : 40

0 : 049

¼Prð 0 : 61 aza 0 : 61 Þ ¼ð 2 Þð 0 : 2291 Þ ¼ 0 :4582 or approximately 46%

That means if we use the proportion of smokers from a sample ofn¼100 to
estimate the true proportion of smokers, only about 46% of the time are we
correct withinG3%; this figure would be 95.5% if the sample size is raised to
n¼1000. What we learn from this example is that compared to the case of
continuous data in Example 4.2, it should take a much larger sample to have a
good estimate of a proportion such as a disease prevalence or a drug side e¤ect.

From this sampling distribution of the sample proportion, in the context of
repeated sampling, we have an approximate 95% confidence interval for a
population proportionp:

pG 1 :96SEðpÞ

where, again, the standard error of the sample proportion is calculated from

SEðpÞ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pð 1 pÞ n

r

There are no easy ways for small samples; this is applicable only to larger

ESTIMATION OF PROPORTIONS 161

Introductory Biostatistics

0 : 37 0 : 40

0 : 049

0 : 43 0 : 40

0 : 049

Get our desktop app

Company

Features

Documentation

Resources

Introductory Biostatistics

0 : 37 0 : 40

0 : 049

0 : 43 0 : 40

0 : 049



Get our desktop app

Company

Features

Documentation

Resources