1132 STATISTICAL METHODS FOR ENVIRONMENTAL SCIENCE
variance of X i , and will be designed as sij The generalization
of the concept of variance to a multidimensional variable
then becomes the matrix of variances and covariances. This
matrix will be called the covariance matrix. The covariance
matrix for the population is given as
∑
⎡
⎣
⎢
⎢
⎢
⎢
⎢
⎤
⎦
⎥
⎥
⎥
⎥
⎥
ss s
ss s
ss s
11 12 1
21 22 2
122
Κ
Κ
ΚΚΚΚΚ
...
n
n
nn nn
.
(30)
A second useful matrix is the matrix of correlations
rr
rr
rr
n
n
nnn
11 1
21 2
1
...
⎡
⎣
⎢
⎢
⎢
⎢
⎢
⎤
⎦
⎥
⎥
⎥
⎥
⎥
(31)
If the assumption is made that each of the individual vari-
ables is described by a normal distribution, then the distri-
bution of X may be described by the multivariate normal
distribution. This assumption will be made in subsequent
discussion, except where noted to the contrary.
Tests on Means
Suppose that measures have been obtained on several vari-
ables for a sample, and it is desired to determine whether that
sample came from some known population. Or there may be
two samples; for example, suppose data have been gathered
on physiological effects of two concentrations of SO 2 for
several measures of physiological functioning and the inves-
tigator wishes to know if they should be regarded as samples
from the same population. In such situations, instead of using
t -tests to determine the significance of each individual differ-
ence separately, it would be desirable to be able to perform
one test, analogous to the t -test, on the vectors of the means.
A test, known as Hotelling’s T^2 test, has been developed
for this purpose. The test does not require that the popula-
tion covariance matrix be known. It does, however, require
that samples to be compared come from populations with
the same covariance matrix, an assumption analogous to the
constant variance requirement of the t -test.
To understand the nature of T^2 in the single sample case,
consider a single random variable made up of any linear
combination of the n variables in the vector X (all of the
variables must enter into the combination, that is, none of the
coefficients may be zero). This variable will have a normal
distribution, since it is a sum of normal variables, and it can
be compared with a linear combination of elements from
the vector for the population with the same coefficients, by
means of a t -test. We then adopt the decision rule that the null
hypothesis will be accepted only if it is true for all possible
linear combinations of the variables. This is equivalent to
saying that it is true for the largest value of t as a function of
the linear combinations. By maximizing t^2 as a function of
the linear combinations, it is possible to derive T^2. Similar
arguments can be used to derive T^2 for two samples.
A related function of the mean is known as the linear
discriminant function. The linear discriminant function is
defined as the linear compound which generates the largest
T^2 value. The coefficients used in this compound provide
the best weighting of the variables of a multivariate obser-
vation for the purpose of deciding which population gave
rise to an observation. A limitation on the use of the linear
discriminant function, often ignored in practice, is that it
requires that the parameters of the population be known, or
at least be estimated from large samples. This statistic has
been used in analysis of data from monitoring stations to
determine whether pollution concentrations exceed certain
criterion values.
Other statistical procedures employing mean vectors are
useful in certain circumstances. See Morrison for a further
discussion of this question.
Multivariate Analysis of Variance (MANOVA)
Just as the concepts underlying the t -test could be general-
ized to the comparison of more than two means, the concepts
underlying the comparison of two mean vectors can be gen-
eralized to the comparison of several vectors of means.
The nature of this generalization can be understood in
terms of the linear model, considered previously in connec-
tion with analysis of variance. In the multivariate situation,
however, instead of having a single observation which is
hypothesized to be made up of several components com-
bined additively, the observations are replaced by vectors of
observations, and the components by vectors of components.
The motivation behind this generalization is similar to that
for Hotelling’s T^2 test: it permits a test of the null hypothesis
for all of the variables considered simultaneously.
Unlike the case of Hotelling’s T^2 , however, various
methods of test construction do not converge on one test sta-
tistic, comparable to the F test for analysis of variance. At
least three test statistics have been developed for MANOVA,
and the powers of the various tests in relation to each other
are very incompletely known.
Other problems associated with MANOVA are similar in
principle to those associated with ANOVA, though computa-
tionally they are more complex. For example, the problem of
multiple comparison of means has its analogous problem in
MANOVA, that of determining which combinations of mean
vectors are responsible for significant test statistics. The
number and type of possible linear models can also ramify
considerably, just as in the case of ANOVA. For further dis-
cussion of MANOVA, see Morrison (1967) or Seal.
Extensions of Correlation Analysis
In a number of situations, where multivariate measurements
are taken, the concern of the investigator centers on the
C019_004_r03.indd 1132C019_004_r03.indd 1132 11/18/2005 1:30:57 PM11/18/2005 1:30:57 PM