Basic Statistics

(Barry) #1

72 THE NORMAL DISTRIBUTION


However, if we do not know the population standard deviation and we desire
to perform some of the analyzes described in subsequent chapters, we will have to
assume that the data are normally distributed.


6.4 EXAMINING DATA FOR NORMALITY

In this section we present four graphical methods of determining if the variables in
a data set are normally distributed. These graphical methods have the advantage of
not only detecting nonnormal data but also of giving us some insight on what to do
to make the data closer to being normally distributed.

6.4.1

One commonly used method for examining data to see if it is at least approximately
normally distributed is to look at a histogram of the data. Distributions that appear to
be markedly asymmetric or have extremely high tails are not normally distributed and
the proportion of the area between any two points cannot be accurately estimated from
the normal distribution. Here we examine the data given in Problem 5.2, comparing
it first with the normal distribution by means of a histogram.
In Figure 6.9, a histogram displaying the systolic blood pressures of 48 younger
adult males is presented. It can be noted that the distribution is skewed to the right
(the right tail is longer than the left). The mean of the distribution is 3 = 137.3 and
s = 32.4. A normal distribution with that mean and standard deviation is displayed
along with the histogram. Many statistical programs allow the user to superimpose
a normal distribution on a histogram. From Figure 6.9, the investigator may see
whether the normal distribution should be used to estimate areas between two points.
The fit is obviously poor, for the histogram’s bars extend much higher than the normal
curve between 100 and 130 mmHg and do not reach the curve between 130 to 190
mmHg. In examining histograms, the investigator must expect some discrepancies.
This is especially true for small samples, where the plotted percents may not fit a
normal curve well even if the data were sampled from a normal distribution.
A box plot can be examined for the same data set (see Figure 6.10). The distance
between the median and Q3 appears to be about twice the distance between the median
and &I, a clear indication that the distribution is skewed to the right.


Using Histograms and Box Plots

6.4.2 Using Normal Probability Plots or Quantile-Quantile Plots

Another commonly used method to compare whether a particular variable is normally
distributed is to examine the cumulative distribution of the data and compare it with
that of a cumulative normal distribution. Table 4.5 illustrated the computation of
the cumulative frequency in percent; in it, the percent cumulative frequency gave
the percentage of miners with hemoglobin levels below the upper limit of each class
interval. Rather than plot a histogram, the investigator can plot the upper limit of
each class interval on the horizontal axis and the corresponding cumulative percent
on the vertical axis.

Free download pdf