Basic Statistics

(Barry) #1
132 CATEGORICAL DATA: PROPORTIONS

illness who are treated medically. Or comparison may be made of the proportion of
women who have a certain illness with the proportion of men with the same illness.
Or a comparison of the proportion of diabetic amputees who succeed in walking
with prostheses with the proportion of nondiabetic amputees who succeed may be of
interest.
Suppose that we wish to perform a clinical trial comparing two cold remedies. The
first population consists of all patients who might have colds and are given the first
treatment; let 7r1 be the proportion of the first population who recover within 10 days.
Similarly, the second population consists of all patients who might have colds and are
given the second treatment; n-2 is the proportion of the second population who recover
within 10 days. We take 200 patients, divide them at random into two equal groups,
and give 100 patients the first treatment and 100 patients the second treatment. We
then calculate the proportion in each sample who recover within 10 days, PI and pa,
and compute the difference PI - pz.
It is now necessary to consider the distribution of pl - pz. In Section 7.5, on
the difference between two means for continuous data, we noticed that if TI is
normally distributed with mean = p1 and with variance = 02 , and if xz is normally
X1L
distributed with mean = pz and with variance = 02 , then XI - XZ is normally


distributed with mean = pl - pz and variance = ax + a?
The situation is almost the same with the difference of two sample proportions when
the sample sizes n1 and n2 are large. If p~ is normally distributed with mean = TI
and variance = 7r1 (1 - 7rl)/nI, and if pz is normally distributed with mean = 7rz and
variance = 7rz (1 - TZ) /nz, then PI - pz is normally distributed with mean = 7r1 - 7rz
and variance = 7rI(l - 7rl)/nl + ~z(l - 7rz)/nz. That is, the difference between
two sample proportions, for large sample sizes, is normally distributed with the mean
equal to the difference between the two population proportions and with the variance
equal to the sum of the variances of p~ and PZ.
In our example, if 90 patients of the 100 patients who receive the first treatment
recover within lodays, PI = .90; if 80 of the 100 patients who receive the second
treatment recover within 10 days, pz = 30. Then, pl - pz = .10 is the best estimate
for 7r1 - 7rz. Since the standard deviation of PI - pl is equal to

x2
x1 X2’

the usual way of forming 95% confidence intervals gives, for TI - TZ,


Again the standard deviation must be estimated from the sample using .90 in place of
TI and .80 in place of 7rz; apl is estimated by

J.90(1 - .90)/100 + .80(1 - .80)/100 = J(.9)(.1)/100 + (.8)(.2)/100
= = .05
Free download pdf