Basic Statistics

(Barry) #1
126 CATEGORICAL DATA: PROPORTIONS

In Section 10.1, methods for calculating the population mean and variance from
a single population are presented. Here, the formulas for the population parameters
are given before those for the sample. The relationship between the sample statistics
and the population parameters is presented in Section 10.2. The binomial distribution
is introduced, and we explain that in this book only the normal approximation to the
binomial distribution is covered. Section 10.3 covers the use of the normal approxi-
mation. Confidence limits for a single population proportion are given in Section 10.4
and for the difference between two proportions in Section 10.5. Tests of hypotheses
are discussed in Section 10.6, and the needed sample size for testing two proportions
is given in Section 10.7. When either confidence intervals or tests of hypothesis are
given, we assume that simple random samples have been taken. Section 10.8 covers
data entry of categorical data and typical output from statistical programs.


10.1 SINGLE POPULATION PROPORTION


Here, we consider the young patients who underwent the cleft palate operation to be
the entire population; we have 5 who had a complication (failures) and 15 who had no
complications (successes). To make it simpler to count the number of successes and
failures, we shall code a success as a 1 and a failure as a 0. We then have 15 ones and
5 zeros and N = 20 is the total number of observations in the population. The data
are usually reported in terms of the proportion of successes (the number of successes
over the total number of observations in the population). Since in this example we
are reporting the results in terms of successes, we coded the successes as a 1 and the
failures as a 0. For our population of young patients, this proportion of successes
is 15/20 = $75 and .75 is equivalent to the mean of the population since we have
divided the sum of the numerical values of the observations by N. That is, if we add
the 15 ones and 5 zeros we get 15, and 15 divided by 20 is .75. Similarly, if we count
the number of successes and divide by the number of observations, we get 15 over
20, or .75. The population mean is called T. In this population, the population mean
is .n; = .75. This is also the population proportion of successes. The proportion of
successes and the proportion of failures must add to 1 since those are the only two
possible outcomes. Hence, the proportion of failures is 1 - T.


10.1.1 Graphical Displays of Proportions

Graphically, this type of data is commonly displayed as pie charts such as that given in
Figure 10.1 or bar charts as shown in Figure 10.2. Pie charts are readily interpretable
when there are only two outcomes. Pie charts should be used only when there is
a small number of categories and the sum of the categories has some meaning (see
Cleveland [1985] or Good and Hardin [2003]). Pie charts and bar charts are widely
available in software packages.
In Figure 10.2 it can be seen that 15 of the patients had a successful operation.
Bar charts have the advantage that they can be used with any number of categories.
Free download pdf