Encyclopedia of Environmental Science and Engineering, Volume I and II

(Ben Green) #1

STATISTICAL METHODS FOR ENVIRONMENTAL SCIENCE 1133


prediction of one of the variables. When rainfall measure-
ments are taken in conjunction with a number of other vari-
ables, such as temperature, pressure, and so on, for example,
the purpose is usually to predict the rainfall as a function of
the other variables. Thus, it is possible to view one variable
as the dependent variable for a prior reasons, even though
the data do not require such a view.
In these situations, the investigator very often has one of
two aims. He may wish to predict one of the variables from
all of the other variables. Or he may wish to consider one
variable as a function of another variable with the effect of
all the other variables partialled out. The first situation calls
for the use of multiple correlation. In the second, the appro-
priate statistic is the partial correlation coefficient.
Multiple correlation coefficients are used in an effort to
improve prediction by combining a number of variables to
predict the variable of interest. The formula for three vari-
ables is

r

rr rrr

(^123) r
12
2
13
2
12 13 23
23
2
2


. 1
.


    


(32)

Generalizations are available for larger numbers of variables.
If the variables are relatively independent of each other, mul-
tiple correlation may improve prediction. However, it should
be obvious that this process reaches an upper limit since
additional variables, if they are to be of any value, must show
a reasonable correlation with the variable of interest, and the
total amount of variance to be predicted is fixed. Each addi-
tional variable can therefore only have a limited effect.
Partial correlation is used to partial out the effect of one
of more variables on the correlation between two other vari-
ables. For example, suppose it is desired to study the relation-
ship between body weight and running speed, independent
of the effect of height. Since height and weight are corre-
lated, simply doing a standard correlation between running
speed and weight will not solve the problem. However, com-
puting a partial correlation, with the contribution of height
partialled out, will do so. The partial correlation formula for
three variables is

r

rrr

rr

12 3

12 13 23

13

2
23

.  112 ,






(33)

where r 12.3 gives the correlation of variables 1 and 2, with
the contribution of variable 3 held constant. This formula
may also be extended to partial out the effect of additional
variables.
Let us return for a moment to a consideration of the pop-
ulation correlation matrix, p. It may be that the investigator
has same a priori reason for believing that certain relation-
ships exist among the correlations in this matrix. Suppose,
for example there is a reason to believe that several variables
are heavily dependent on wind velocity and that another set
of variables are dependent on temperature. Such a pattern of

underlying relations would result in systematic patterns of
high and low correlations in the population matrix, which
should be reflected in the observed correlation matrix. If the
obtained correlation matrix is partitioned into sets in accor-
dance with the a priori hypothesis, test for the independence
of the sets will indicate whether or not the hypothesis should
be rejected. Procedures have been developed to deal with
this situation, and also to obtain coefficients reflecting the
correlation between sets of correlations. The latter procedure
is known as canonical correlation. Further information about
these procedures may be found in Morrison.

Other Analyses of Covariance and Correlation
Matrices

In the analyses discussed so far, there have been a priori
considerations guiding the direction of the analysis. The sit-
uation may arise, however, in which the investigator wishes
to study the patterns in an obtained correlation or covariance
matrix without any appeal to a priori considerations. Let us
suppose, for example, that a large number of measurements
relevant to weather prediction have been taken, and the
investigator wishes to look for patterns among the variables.
Or suppose that a large number of demographic variables
have been measured on a human population. Again, it is rea-
sonable to ask if certain of these variables show a tendency
to be more closely related than others, in the absence of any
knowledge about their actual relations. Such analyses may
be useful in situations where large numbers of variables are
known to be related to a single problem, but the relationships
among the variables are not well understood. An investiga-
tion of the correlation patterns may reveal consistencies in
the data which will serve as clues to the underlying process.
The classic case for the application of such techniques
has been the study of the human intellect. In this case, cor-
relations among performances on a very large number of
tasks have been obtained and analyzed, and many theories
about the underlying skills necessary for intellectual func-
tion have been derived from such studies. The usefulness of
the techniques are by no means limited to psychology, how-
ever. Increasingly, they are being applied in other fields, as
diverse as biology (Fisher and Yates, 1964) and archaeology
(Chenhall, 1968). Principal component analysis, a closely
related technique, has been used in hydrology.
One of the more extensively developed techniques for the
analysis of correlation matrices is that of factor analysis. To
introduce the concepts underlying factor analysis, imagine a
correlation matrix in which the first x variables and the last
n  x variables are all highly correlated with each other,
but the correlation between any of the first x and any of the
second n  x variables is very low. One might suspect that
there is some underlying factor which influences the first set
of variables, and another which influences the second set of
variables, and that these two factors are relatively indepen-
dent statistically, since the variables which they influence
are not highly correlated. The conceptual simplification is
obvious; instead of worrying about the relationships among
n variables as reflected in their n ( n  1)/2 correlations, the

C019_004_r03.indd 1133C019_004_r03.indd 1133 11/18/2005 1:30:57 PM11/18/2005 1:30:57 PM

Free download pdf