sum areas in exactly the same way that we did in the pie chart. When we move to more
common distributions, particularly the normal distribution, the principles of areas,
percentages, probabilities, and the addition of areas or probabilities carry over almost
without change.
3.1 The Normal Distribution
Now we’ll move closer to the normal distribution. I stated earlier that the normal distribu-
tion is one of the most important distributions we will encounter. There are several reasons
for this:
- Many of the dependent variables with which we deal are commonly assumed to be nor-
mally distributed in the population. That is to say, we frequently assume that if we were
to obtain the whole population of observations, the resulting distribution would closely
resemble the normal distribution. - If we can assume that a variable is at least approximately normally distributed, then the
techniques that are discussed in this chapter allow us to make a number of inferences
(either exact or approximate) about values of that variable. - The theoretical distribution of the hypothetical set of sample means obtained by draw-
ing an infinite number of samples from a specified population can be shown to be ap-
proximately normal under a wide variety of conditions. Such a distribution is called the
sampling distribution of the mean and is discussed and used extensively throughout the
remainder of this book. - Most of the statistical procedures we will employ have, somewhere in their derivation,
an assumption that the population of observations (or of measurement errors) is nor-
mally distributed.
To introduce the normal distribution, we will look at one additional data set that is ap-
proximately normal (and would be even closer to normal if we had more observations). The
data we are going to look at were collected using the Achenbach Youth Self Report form
(Achenbach, 1991b), a frequently used measure of behavior problems that produces scores
on a number of different dimensions. We are going to focus on the dimension of Total
Behavior Problems, which represents the total number of behavior problems reported by
the child (weighted by the severity of the problem). (Examples of Behavior Problem cate-
gories are “Argues,” “Impulsive,” “Shows off,” and “Teases.”) Figure 3.3 is a histogram of
data from 289 junior high school students. A higher score represents more behavior prob-
lems. You can see that this distribution has a center very near 50 and is fairly symmetrically
distributed on each side of that value, with the scores ranging between about 25 and 75.
The standard deviation of this distribution is approximately 10. The distribution is not per-
fectly even—it has some bumps and valleys—but overall it is fairly smooth, rising in the
center and falling off at the ends. (The actual mean and standard deviation for this particu-
lar sample are 49.1 and 10.56, respectively.)
One thing that you might note from this distribution is that if you add the frequencies
of subjects falling in the intervals 52–54 and 54–56, you will find that 54 students obtained
scores between 52 and 56. Because there are 289 observations in this sample, 54/289 5
19% of the observations fell in this interval. This illustrates the comments made earlier on
the addition of areas.
We can take this distribution and superimpose a normal distribution on top of it. This is
frequently done to casually evaluate the normality of a sample. The smooth distribution
superimposed on the raw data in Figure 3.4 is a characteristic normal distribution. It is a
68 Chapter 3 The Normal Distribution