FROM WHAT HAS BEEN SAIDin the preceding chapters, it is apparent that we are going to be
very much concerned with distributions—distributions of data, hypothetical distributions of
populations, and sampling distributions. Of all the possible forms that distributions can take,
the class known as the normal distributionis by far the most important for our purposes.
Before elaborating on the normal distribution, however, it is worth a short digression to
explain just why we are so interested in distributions in general, not just the normal distri-
bution. The critical factor is that there is an important link between distributions and prob-
abilities. If we know something about the distribution of events (or of sample statistics),
we know something about the probability that one of those events (or statistics) is likely to
occur. To see the issue in its simplest form, take the lowly pie chart. (This is the only time
you will see a pie chart in this book, because I find it very difficult to compare little slices
of pie in different orientations to see which one is larger. There are much better ways to
present data. However, the pie chart serves a useful purpose here.)
The pie chart shown in Figure 3.1 is taken from a report by the Joint United Nations
Program on AIDS/HIV and was retrieved from http://data.unaids.org/pub/EpiReport/
2006/2006_EpiUpdate_en.pdf in September, 2007. It shows the source of AIDS/HIV
infection for people in Eastern Europe and Central Asia. One of the most remarkable things
about this chart is that it shows that in that region of the world the great majority of
AIDS/HIV cases result from intravenous drug use. (This is not the case in Latin America,
the United States, or South and South-East Asia, where the corresponding percentage is ap-
proximately 20%, but we will focus on the data at hand.)
From Figure 3.1 you can see that 67% of people with HIV contracted it from injected
drug use (IDU), 4% of the cases involved sexual contact between men (MSM), 5% of cases
were among commercial sex works (CSW), 6% of cases were among clients of commer-
cial sex workers (CSW-cl), and 17% of cases were unclassified or from other sources. You
can also see that the percentages of cases in each category are directly reflected in the per-
centage of the area of the pie that each wedge occupies. The area taken up by each segment
is directly proportional to the percentage of individuals in that segment. Moreover, if we
declare that the total area of the pie is 1.00 unit, then the area of each segment is equal to
the proportion of observations falling in that segment.
It is easy to go from speaking about areas to speaking about probabilities. The concept
of probability will be elaborated in Chapter 5, but even without a precise definition of prob-
ability we can make an important point about areas of a pie chart. For now, simply think of
66 Chapter 3 The Normal Distribution
MSM 4%
Eastern Europe and
Central Asia
CSW 5%
CSW
clients 7%
All others
17%
IDU 67%
IDU: Injecting drug users
MSM: Men having sex with men
CSW: Commercial sex workers
Figure 3.1 Pie chart showing sources of HIV infections in different populations
normal
distribution