STATISTICAL METHODS FOR ENVIRONMENTAL SCIENCE 1131
Another way of looking at correlation is by consider-
ing the regression of one variable on another. Figure 7
shows the relation between two variables, for two sets of
bivariate data, one with a 0.0 correlation, the other with a
correlation of 0.75. Obviously, estimates of type value of one
variable based on values of the other are better in the case of
the higher correlation. The formula for the regression of y on
x is given by
yrxy
y
xy x
x
ˆ
ˆ
( ˆ )
(ˆ )
.
m
s
m
s
(25)
A similar equation exists for the regression of x on y.
A number of other correlation measures are available.
For ranked data, the Spearman correlation coefficient, or
Kendall’s tau, are often used. Measures of correlation appro-
priate for frequency data also exist. See Siegel.
MULTIVARIATE ANALYSIS
Measurements may be available on more than two variables
for each experiment. The environmental field is one which
offers great potential for multivariate measurement. In areas of
environmental concern such as water quality, population stud-
ies, or the study of the effects of pollutants on organisms, to
name only a few, there are often several variables which are of
interest. The prediction of phenomena of environmental inter-
est, or such as rainfall, or floods, typically involves the consid-
eration of many variables. This section will be concerned with
some problems in the analysis of multivariate data.
Multivariate Distributions
In considering multivariate distributions, it is useful to define
the n -dimensional random variable X as the vector
XXXX′[, 12 ,Κ, ].n
(26)
The elements of this vector will be assumed to be con-
tinuous unidimensional random variables, with density
functions f 1 (x 1 ), Ff 2 (x 2 )K,fn(xn) and distribution functions
F 1 (x 1 ),F 2 (x 2 )K,Fn(xn) Such a vector also has a joint distribu-
tion function.
Fx x(, ,, ) (^12 ΚΚxnn=PX^1 x^1 ,,X xn)
(27)
where P refers to the probability of all the stated conditions
occurring simultaneously.
The concepts considered previously in regard to univari-
ate distribution may be generalized to multivariate distri-
butions. Thus, the expected value of the random vector, X,
analogous to the mean of the univariate distribution, is
EX( ′)[ (EX EX^12 ), ( ),KEX( n)],
(28)
where the E ( X i ) are the expected values, or means, for the
univariate distributions.
Generalization of the concept of variance is more com-
plicated. Let us start by considering the covariance between
two variables,
(^) sijEX EX X[ i ( i)][ jEX( j)]. (29)
The covariances between each of the elements of the vector
X can be computed; the covariances of the i th and j th ele-
ments will be designed as sij If i = j the covariance is the
r = 0.0
r = 0.75
X
X
FIGURE 7
C019_004_r03.indd 1131C019_004_r03.indd 1131 11/18/2005 1:30:57 PM11/18/2005 1:30:57 PM