Encyclopedia of Environmental Science and Engineering, Volume I and II

(Ben Green) #1

1126 STATISTICAL METHODS FOR ENVIRONMENTAL SCIENCE


importance in environmental work. Others are encountered
occasionally, such as the exponential distribution, which
has been used to compute probabilities in connection with
the expected failure rate of equipment. The distribution of
times between occurrences of events in Poisson processes are
described by the exponential distribution and it is important
in the theory of such stochastic processes (Parzen, 1962).
Further discussion of continuous distributions may be found
in Freund (1962) or most other standard statistical texts.
A special distribution problem often encountered in envi-
ronmental work is concerned with the occurrence of extreme
values of variables described by any one of several distribu-
tions. For example, in forecasting floods in connection with
planning of construction, or droughts in connection with
such problems as stream pollution, concern is with the most
extreme values to be expected. To deal with such problems,
the asymptotic theory of extreme values of a statistical vari-
able has been developed. Special tables have been developed
for estimating the expected extreme values for several dis-
tributions which are unlimited in the range of values which
can be taken on by their extremes. Some information is also
available for distributions with restricted ranges. An interest-
ing application of this theory to prediction of the occurrence
of unusually high tides may be found in Pfafflin (1970) and
the Delta Commission Report (1960) Further discussion
may be found in Gumbel.

HYPOTHESIS TESTING

Sampling Considerations

A basic consideration in the application of statistical pro-
cedures is the selection of the data. In parameter estimation
and hypothesis testing sample data are used to make infer-
ences to some larger population. The data are assumed to
be a random sample from this population. By random we
mean that the sample has been selected in such a way that
the probability of obtaining any particular sample value
is the same as its probability in the sampled population.
When the data are taken care must be used to insure that the
data are a random sample from the population of interest,
and make sure that there must be no biases in the selec-
tive process which would make the samples unrepresenta-
tive. Otherwise, valid inferences cannot be made from the
sample to the sampled population.
The procedures necessary to insure that these conditions
are met will depend in part upon the particular problem being
studied. A basic principle, however, which applies in all
experimental work is that of randomization. Randomization
means that the sample is taken in such a way that any uncon-
trolled variables which might affect the results have an equal
chance of affecting any of the samples. For example, in agri-
cultural studies when plots of land are being selected, the
assignment of different experimental conditions to the plots
of land should be done randomly, by the use of a table of
random numbers or some other randomizing process. Thus,

any differences which arise between the sample values as
a result of differences in soil conditions will have an equal
chance of affecting each of the samples.
Randomization avoids error due to bias, but it does
nothing about uncontrolled variability. Variability can be
reduced by holding constant other parameters which may
affect the experimental results. In a study comparing the
smog-producing effects of natural and artificial light, other
variables, such as temperature, chamber dilution, and so on,
were held constant (Laity, 1971) Note, however, that such
control also restricts generalization of the results to the con-
ditions used in the test.
Special sampling techniques may be used in some cases
to reduce variability. For example, suppose that in an agricul-
tural experiment, plots of land must be chosen from three dif-
ferent fields. These fields may then be incorporated explicitly
into the design of the experiment and used as control vari-
ables. Comparisons of interest would be arranged so that they
can be made within each field, if possible. It should be noted
that the use of control variables is not a departure from ran-
domization. Randomization should still be used in assigning
conditions within levels of a control variable. Randomization
is necessary to prevent bias from variables which are not
explicitly controlled in the design of the experiment.
Considerations of random sampling and the selection
of appropriate control variables to increase precision of the
experiment and insure a more accurate sample selection can
arise in connection with all areas using statistical methods.
They are particularly important in certain environmental
areas, however. In human population studies great care must
be taken in the sampling procedures to insure representative-
ness of the samples. Simple random sampling techniques are
seldom adequate and more complex procedures, have been
developed. For further discussion of this kind of sampling,
see Kish (1965) and Yates (1965). Sampling problems arise
in connection with inferences from cloud seeding experi-
ments which may affect the generality of the results (Bernier,
1967). Since most environmental experiments involve vari-
ables which are affected by a wise variety of other variables,
sampling problems, especially the question of generalization
from experimental results, is a very common problem. The
specific randomization procedures, control variables and
limitations on generalization of results will depend upon the
particular field in question, but any experiment in this area
should be designed with these problems in mind.

Parameter Estimation

A common problem encountered in environmental work is
the estimation of population parameters from sample values.
Examples of such estimation questions are: What is the
“best” estimate of the mean of a population: Within what
range of values can the mean safely be assumed to lie?
In order to answer such questions, we must decide what
is meant by a “best” estimate. Probably the most widely used
method of estimation is that of maximum likelihood, devel-
oped by Fisher (1958). A maximum likelihood estimate is one
which selects that parameter value for a distribution describing

C019_004_r03.indd 1126C019_004_r03.indd 1126 11/18/2005 1:30:56 PM11/18/2005 1:30:56 PM

Free download pdf