Introductory Biostatistics

(Chris Devlin) #1

tributions in nature are normal. Strictly speaking, that is false. Even more
strictly speaking, theycannot be exactly normal. Some, such as heights of adults
of a particular gender and race, are amazingly close to normal,but never exactly.
The normal distribution is extremely useful in statistics, but for a very dif-
ferent reason—not because it occurs in nature. Mathematicians proved that for
samples that are ‘‘big enough,’’ values of their sample means,x^0 s(including
sample proportions as a special case), are approximately distributed as normal,
even if the samples are taken from really strangely shaped distributions. This
important result is called thecentral limit theorem. It is as important to statis-
tics as the understanding of germs is to the understanding of disease. Keep in
mind that ‘‘normal’’ is just a name for this curve; if an attribute is not dis-
tributed normally, it does not imply that it is ‘‘abnormal.’’ Many statistics texts
provide statistical procedures for finding out whether a distribution is normal,
but they are beyond the scope of this book.
From now on, to distinguish samples from populations (a sample is a sub-
group of a population), we adopt the set of notations defined in Table 3.7.
Quantities in the second column (m,s^2 , andp) are parameters represent-
ing numerical properties of populations; m and s^2 for continuously mea-
sured information andpfor binary information. Quantities in the first column
(x,s^2 , andp) are statistics representing summarized information from samples.
Parameters are fixed (constants) but unknown, and each statistic can be used as
an estimate for the parameter listed in the same row of the foregoing table. For
example,xis used as an estimate ofm; this topic is discussed in more detail in
Chapter 4. A major problem in dealing with statistics such asxandpis that if
we take a di¤erent sample—even using the same sample size—values of a sta-
tistic change from sample to sample. The central limit theorem tells us that if
sample sizes are fairly large, values ofx(orp) in repeated sampling have a very
nearly normal distribution. Therefore, to handle variability due tochance,soas
to be able to declare—for example—that a certain observed di¤erence is more
than would occur by chance but is real, we first have to learn how to calculate
probabilities associated withnormal curves.
The termnormal curve, in fact, refers not to one curve but to a family of
curves, each characterized by a meanmand a variances^2. In the special case
wherem¼0 ands^2 ¼1, we have thestandard normal curve. For a givenmand


TABLE 3.7


Notation

Quantity Sample Population


Mean x(x-bar) m(mu)
Variance s^2 (s squared) s^2 (sigma squared)
Standard deviation s s
Proportion p p(pi)


122 PROBABILITY AND PROBABILITY MODELS

Free download pdf