1130 STATISTICAL METHODS FOR ENVIRONMENTAL SCIENCE
becomes cumbersome for problems of any complexity, and
a number of computer programs are available for analyzing
various designs. The Biomedical Statistical Programs (Ed. by
Dixon 1967) are frequently used for this purpose. A method
recently developed by Fowlkes (1969) permits a particularly
simple specification of the design problem and has the flex-
ibility to handle a wide variety of experimental designs.
SPECIAL ESTIMATION PROBLEMS
The estimation problems we have considered so far have
involved single experiments, or sets of data. In environmen-
tal work, the problem of arriving at an estimate by combin-
ing the results of a series of tests often arises. Consider, for
example, the problem of estimating the coliform bacteria
population size in a specimen of water from a series of dilu-
tion tests. Samples from the water specimen are diluted by
known amounts. At some point, the dilution becomes so
great that the lactose broth brilliant green bile test for the
presence of coliform bacteria becomes negative (Fair and
Geyer, 1954). From the amount of dilution necessary to
obtain a negative test, plus the assumption that one organism
is enough to yield a positive response, it is possible to esti-
mate the original population size in the water specimen.
In making such an estimate, it is unsatisfactory simply
to use the first negative test to estimate the population size.
Since the diluted samples may differ from one another, it is
possible to get a negative test followed by one or more posi-
tive tests. It is desirable, rather, to estimate the population
from the entire series of tests. This can be done by setting
up a combined hypothesis based on the joint probabilities of
all the obtained results, and using likelihood estimation pro-
cedures to arrive at the most likely value for the population
parameter, which is known as the Most Probable Number
(MPN) (Fair and Geyer, 1954). Tables have been prepared
for estimating the MPN for such tests on this principle, and
similar procedures can be used to arrive at the results of a set
of tests in other situations.
Sequential testing is a problem that sometimes arises in
environmental work. So far, we have assumed that a con-
stant amount of data is available. However, very often, the
experimenter is making a series of tests, and wishes to know
whether he has enough data to make a decision at a given
level of reliability, or whether he should consider taking
additional data. Such estimation problems are common in
quality control, for example, and may arise in connection
with monitoring the effluent from various industrial pro-
cesses. Statistical procedures have been developed to deal
with such questions. They are discussed in Wald.
CORRELATION AND RELATED TOPICS
So far we have discussed situations involving a single vari-
able. However, it is common to have more than one type
of measure available on the experimental units. The sim-
plest case arises where values for two variables have been
obtained, and the experimenter wishes to know how these
variables relate to one another.
Curve Fitting
One problem which frequently arises in environmental work
is the fitting of various functions to bivariate data. The sim-
plest situation involves fitting a linear function to the data
when all of the variability is assumed to be in the Y variable.
The most commonly used criterion for fitting such a function
is the minimization of the squared deviations from the line,
referred to as the least squares criterion. The application of
this criterion yields the following simultaneous equations:
YnA Xi
i
n
i
i
n
11
∑∑
(22)
and
XYii A X B X
i
n
i
i
n
i
i
n
11
2
1
∑∑∑.
(22)
These equations can be solved for A and B, the intercept and
slope of the best fit line. More complicated functions may
also be fitted, using the least squares criterion, and it may be
generalized to the case of more than two variables. Discussion
of these procedures may be found in Daniel and Wood.
Correlation and Regression
Another method of analysis often applied to such data is
that of correlation. Suppose that our two variables are both
normally distributed. In addition to investigating their indi-
vidual distributions, we may wish to consider their joint
occurrence. In this situation, we may choose to compute the
Pearson product moment correlation between the two vari-
ables, which is given by
r
xy
xy
ii
xy
cov( )
ss
(23)
where cov( x i y i ) the covariance of x and y, is defined as
()()
.
xy
n
ixiy
i
n
mm
1
∑
(24)
It is the most common measure of correlation. The square
of r gives the proportion of the variance associated with one
of the variables which can be predicted from knowledge of
the other variables. This correlation coefficient is appropri-
ate whenever the assumption of a normal distribution can be
made for both variables.
C019_004_r03.indd 1130C019_004_r03.indd 1130 11/18/2005 1:30:57 PM11/18/2005 1:30:57 PM