Encyclopedia of Environmental Science and Engineering, Volume I and II

(Ben Green) #1

1134 STATISTICAL METHODS FOR ENVIRONMENTAL SCIENCE


investigator can attempt to identify and measure the factors
directly.
Factor analysis uses techniques from matrix algebra to
accomplish mathematically the process we have outlined
intuitively above. It attempts to determine the number of
factors, and also the extent to which each of these factors
influences the measured variables. Since unique solutions to
this problem do not exist, the technique has been the subject
of considerable debate, especially on the question of how
to determine the best set of factors. Nevertheless, it can be
useful in any situation where the relationships among a large
set of variables is not well understood.

ADDITIONAL PROCEDURES

Multidimensional Scaling and Clustering

There are a group of techniques whose use is motivated by
considerations similar to those underlying the analysis of
correlation matrices, but which are applied directly to matri-
ces of the distances, or similarities, between various stimuli.
Suppose, for example, that people have been asked to judge
the similarity of various countries. These judgments may
be scaled by multidimensional techniques to discover how
many dimensions underlie the judgments. Do people make
such judgments along a single dimension? Or are several
dimensions involved? An interesting example of this sort
was recently analyzed by Wish (1972). Sophisticated tech-
niques have been worked out for such procedures.
Multidimensional scaling has been most extensively
used in psychology, where the structure underlying simi-
larity or distance measurements may not be at all obvious
without such procedures. Some of these applications are of
potential importance in the environmental field, especially
in areas such as urban planning, where decisions must take
into account human reactions. They are not limited to such
situations however, and some intriguing applications have
been made in other fields.
A technique related in some ways to multidimensional
analysis is that of cluster analysis. Clustering techniques
can be applied to the same sort of data as multidimensional
scaling procedures. However, the aim is somewhat differ-
ent. Instead of looking for dimensions assumed to underlie
the data, clustering techniques try to define related clusters
of stimuli. Larger clusters may then be identified, until a
hierarchical structure is defined. If the data are sufficiently
structured, a “Tree” may be derived.
A wide variety of clustering techniques have been
explored, and interest seems on the increase (Johnson,
1967). The procedures used depend upon the principles
used to define the clusters. Clustering techniques have been
applied in a number of different fields. Biologists have used
them to study the relationships among various animals; for
example, a kind of numerical taxonomy.
The requirements which the data must meet for multi-
dimensional scaling and clustering procedures to apply are
usually somewhat less stringent than in the case of the mul-
tivariate procedures discussed previously. Multidimensional

scaling in psychology is often done on data for which an
interval scale of measurement cannot be assumed. Distance
measures for clustering may be obtained from the clustering
judgments of a number of individuals which lack an ordinal
scale. This relative freedom is also useful in many applica-
tions where the order of items is known, but the equivalence
of the distances between items measured at different points
is questionable.

Stochastic Processes

A stochastic or random process is any process which includes
a random element in its description. The term stochastic
process is frequently also used to describe the mathemati-
cal description of any actual stochastic process. Stochastic
models have been developed in a number of areas of envi-
ronmental concern.
Many stochastic processes involve space or time as a
primary variable. Bartlett (1960) in his discussion of eco-
logical frequency distributions begins with the application of
the Poisson distribution to animal populations whose density
is assumed to be homogeneous over space, and then goes
on to develop the consequences of assuming heterogeneous
distributions, which are shown to lead to other distributions,
such as the negative binomial. The Sutton equation for the
diffusion of gases applied to stack effluents, a simplification
of which was given earlier for a single dimension (Strom,
1968) is another example of a situation in which statistical
considerations about the physical process lead to a spatial
model, in this case, one involving two dimensions.
Time is an important variable in many stochastic models.
A number of techniques have been developed for the analy-
sis of time series. Many of the concepts we have already con-
sidered, such as the mean and variance, can be generalized to
time series. The autocorrelation function, which consists of
the correlation of a function with itself for various time lags,
is often applied to time series data. This function is useful in
revealing periodicities in the data, which show up as peaks in
the function. Various modifications of this concept have been
developed to deal with data which are distributed in discrete
steps over time. Time series data, especially discrete time series
data, often arise in such areas as hydrology, and the study of air
pollution, where sampling is done over time. Such sampling is
often combined with spatial sampling, as when meterological
measurements are made at a number of stations.
An important consideration in connection with time
series is whether the series is stationary or non-stationary.
Stationarity of a time series implies that the behavior of the
random variables involved does not depend on the time at
which observation of the series is begun. The assumption of
stationarity simplifies the statistical treatment of time series.
Unfortunately, it is often difficult to justify for environmen-
tal measurements, especially those taken over long time
periods. Examination of time series for evidence of non-sta-
tionarity can be a useful procedure, however; for example,
it may be useful in determining whether long term climatic
changes are occurring (Quick, 1992). For further discussion
of time series analysis, see Anderson.

C019_004_r03.indd 1134C019_004_r03.indd 1134 11/18/2005 1:30:57 PM11/18/2005 1:30:57 PM

Free download pdf