Basic Statistics

(Barry) #1

76 THE NORMAL DISTRIBUTION


an approximately normal distribution. In this process, the same transformation is
applied to each observation of a given variable. Then, all analyses are performed on
the transformed data and all results apply to the transformed data.

6.5.1 Finding a SuitableTransformation

Fortunately, the process of finding a suitable transformation to obtain approximately
normal data is not a time-consuming one using typical statistical programs. Some
programs simply provide a list of transformations to choose from. The most com-
monly provided transformations for transforming data to normality are logarithmic
and square-root transformations. Other statistical programs also allow users to define
their own transformation. When this option is not available, user-defined transfor-
mations can first be performed using spreadsheet programs, and then the data can be
transferred to a statistical program.
A commonly used transformation is taking the logarithm to the base 10 of each
observation in the distribution that is skewed to the right. Note that the logarithm of
a number X satisfies the relationship that X = loy. Thus, the logarithm of X is the
power Y to which 10 must be raised to produce X. The logarithm of X is usually
abbreviated as log(X). The log(l0) is 1 since 10 = lo1, and the log(100) is 2 since
100 = lo2. The log of 1 is 0 since 1 = 10'. Note that as X increases from 1 to 10,
log(X) increases from 0 to 1, and if X goes from 10 to 100, log(X) goes from 1 to 2.
For larger values of X, it takes an even greater increase in X for log(X) to increase
much. For a distribution that is skewed to the right, taking the logarithm of X has
the effect of reducing the length of the upper tail of the distribution, making it more
nearly symmetrical.
The logarithm of any X that is 5 0 is undefined and the logarithm of any X < 1
but > 0 is negative. Note that 5 signifies less than or equal to and 2 signifies greater
than or equal to. When X is < 1, a small positive constant A can be added first to
X so that the logarithm of X plus a constant A [or log(X + A)] results in a positive
number.
If the numbers being transformed are all quite large, a positive constant is often
subtracted first. This has the result of increasing the effect of taking the log transfor-
mation.
One general strategy for finding an appropriate transformation is the use of power
transformations (see Tukey [I9771 or Afifi et al. 120041). Consider the effects of
taking Xp for various values of P. With P = 2, large values of X become much
larger. With P = 1, there is no change in the value of X(X1 = X). With P = .5 we
have the square-root transformation which has the effect of reducing the value of large
values of X. The square-root transformation is often used on data that are counts.
Taking P = 0 results in the logarithm transformation (see Cleveland [I9931 for an
explanation), which has the effect of reducing large values of X also. With P = -1
we have 1/X, which changes large values of X into small values and reverses the
ordering of the data.
A general rule is that with a distribution that is skewed to the right as was systolic
blood pressure, we should try values of P < 1; the reduction in skewness to the right

Free download pdf