Robert_V._Hogg,_Joseph_W._McKean,_Allen_T._Craig

(Jacob Rumans) #1
4.5. Introduction to Hypothesis Testing 267

4.5 IntroductiontoHypothesisTesting...................


Point estimation and confidence intervals are useful statistical inference procedures.
Another type of inference that is frequently used concerns tests of hypotheses. As
in Sections 4.1 through 4.3, suppose our interest centers on a random variableX
that has density functionf(x;θ), whereθ∈Ω. Suppose we think, due to theory
or a preliminary experiment, thatθ∈ω 0 orθ∈ω 1 ,whereω 0 andω 1 are disjoint
subsets of Ω andω 0 ∪ω 1 = Ω. We label these hypotheses as


H 0 : θ∈ω 0 versusH 1 : θ∈ω 1. (4.5.1)

The hypothesisH 0 is referred to as thenull hypothesis, whileH 1 is referred to as
thealternative hypothesis. Often the null hypothesis represents no change or no
difference from the past, while the alternative represents change or difference. The
alternative is often referred to as the research worker’s hypothesis. The decision
rule to takeH 0 orH 1 isbasedonasampleX 1 ,...,Xnfrom the distribution ofX
and, hence, the decision could be wrong. For instance, we could decide thatθ∈ω 1
when reallyθ∈ω 0 or we could decide thatθ∈ω 0 when, in fact,θ∈ω 1 .Welabel
these errors Type I and Type II errors, respectively, later in this section. As we
show in Chapter 8, a careful analysis of these errors can lead in certain situations
to optimal decision rules. In this section, though, we simply want to introduce the
elements of hypothesis testing. To set ideas, consider the following example.


Example 4.5.1 (Zea maysData). In 1878 Charles Darwin recorded some data
on the heights ofZea maysplants to determine what effect cross-fertilization or
self-fertilization had on the height ofZea mays. The experiment was to select one
cross-fertilized plant and one self-fertilized plant, grow them in the same pot, and
then later measure their heights. An interesting hypothesis for this example would
be that the cross-fertilized plants are generally taller than the self-fertilized plants.
This is the alternative hypothesis, i.e., the research worker’s hypothesis. The null
hypothesis is that the plants generally grow to the same height regardless of whether
they were self- or cross-fertilized. Data for 15 pots were recorded.
We represent the data as (Y 1 ,Z 1 ),...,(Y 15 ,Z 15 ), whereYiandZiare the heights
of the cross-fertilized and self-fertilized plants, respectively, in theith pot. Let
Xi=Yi−Zi. Due to growing in the same pot,YiandZimay be dependent ran-
dom variables, but it seems appropriate to assume independence between pots, i.e.,
independence between the paired random vectors. So we assume thatX 1 ,...,X 15
form a random sample. As a tentative model, consider the location model


Xi=μ+ei,i=1,..., 15 ,

where the random variableseiare iid with continuous densityf(x). For this model,
there is no loss in generality in assuming that the mean ofeiis 0, for, otherwise, we
can simply redefineμ. Hence,E(Xi)=μ. Further, the density ofXiisfX(x;μ)=
f(x−μ). In practice, the goodness of the model is always a concern and diagnostics
based on the data would be run to confirm the quality of the model.
Ifμ=E(Xi)=0,thenE(Yi)=E(Zi); i.e., on average, the cross-fertilized
plants grow to the same height as the self-fertilized plants. While, ifμ>0then

Free download pdf