population, but also many statistical test procedures are based on the assumption of
normality—t-test, F-test, and Pearson correlation. The sampling distributions of a wide
range of descriptive and test statistics have a normal probability distribution. There is a
probability theory in statistics called the Central Limit Theorem which describes the
sampling distribution of the mean. The theorem states that as the size of a sample
increases, the shape of the sampling distribution approaches normal whatever the shape
of the parent population. The significance of this important theory is that it allows us to
use the normal probability distribution even with sample means from populations which
do not have a normal distribution. For example, binomial samples (counts, such as
true/false) and proportions approximate a normal probability distribution when the
sample sizes are large.
We can think of a normal distribution as a mathematical description of an idealized
population with the following important characteristics:
- It is so large that for practical purposes it can be regarded as unlimited in size.
- Measurements must be on an interval or ratio scale and have at least an underlying
theoretical continuous distribution. - Values are symmetrically distributed about the mean.
- Values close to the mean occur relatively more frequently than those further away, the
frequency falling off to a well defined bell-shaped curve. - Measurement units can be standardized in terms of standard deviation units
(measurement of spread about the mean) sometimes called Z-scores. - About 68 per cent (68.26 per cent) of the measures in a normal distribution lie between
−1.0 and +1.0 SD below and above the mean respectively. The mean is 0 if measures
are standardized. - About 95 per cent (95.44 per cent) of measures lie between −2 and +2 SDs below and
above the mean. - About 99 per cent (99.74 per cent) of measures lie between −3 and +3 SDs below and
above the mean.
A common belief is that there is one normal curve. This is not so. There are many
different normal curves, each particular one described by specifying two parameters,
where the curve is centred, that is the mean μ, and how much the distribution spreads out
about its centre, σ, the standard deviation. With a specified mean and standard deviation,
the probability that a random continuous variable, X, with a particular value falls in a
defined interval on the X axis, is equal to the area under the normal density curve. The
vertical axis, is referred to as density, and is related to the frequency or probability of
occurrence of the variable X.
The value of the mean μ is the particular value on the X axis lying directly below the
centre of the distribution. The value of the standard deviation σ, is at the point of
inflection (where a curve turns up, if previous it was down) on the X axis.
Whenever we try to construct a table of normal distribution densities (probability
values falling within a defined range) we have a problem because each normal
distribution is dependent upon the particular μ and σ of the defining distribution. Rather
than tabulate separate tables for each possible combination of μ and σ, statisticians have
chosen a particular normal curve as a reference curve. This is the standard normal or Z
curve. The normal distribution Z scores and associated probability values are shown in
Statistical analysis for education and psychology researchers 104