Introductory Biostatistics

(Chris Devlin) #1

many tests increases the probability that one or more of the comparisons will
result in a type I error (i.e., a significant test result when the null hypothesis is
true). This statement should make sense intuitively. For example, suppose that
the null hypothesis is true and we perform 100 tests—each has a 0.05 proba-
bility of resulting in a type I error; then 5 of these 100 tests would be statisti-
cally significant as the result of type I errors. Of course, we usually do not need
to do that many tests; however, every time we do more than one, the proba-
bility that at least one will result in a type I error exceeds 0.05, indicating a
falsely significant di¤erence! What is needed is a di¤erent way to summarize the
di¤erences between several means and a method ofsimultaneouslycomparing
these means in one step. This method is called ANOVA or one-way ANOVA,
an abbreviation ofanalysis of variance.
We have continuous measurementsX’s fromkindependent samples; the
sample sizes may or may not be equal. We assume that these are samples from
knormal distributions with a common variances^2 , but the means,mi’s, may or
may not be the same. The case where we apply the two-samplettest is a special
case of this one-way ANOVA model withk¼2. Data from theith sample can
be summarized into sample sizeni, sample meanxi, and sample variancesi^2 .If
we pool data together, the (grand) mean of this combined sample can be cal-
culated from



P


ðniÞðxiÞ
P
ðniÞ

In that combined sample of sizen¼

P


ni, the variation inXis measured
conventionally in terms of the deviationsðxijxÞ(wherexijis thejth mea-
surement from theith sample); the total variation, denoted by SST, is the sum
of squared deviations:


SST¼


X


i;j

ðxijxÞ^2

For example, SST¼0 when all observationxijvalues are the same; SST is the
numerator of the sample variance of the combined sample: The higher the SST
value, the greater the variation among allXvalues. The total variation in the
combined sample can be decomposed into two components:


xijx¼ðxijxiÞþðxixÞ


  1. The first term reflects the variationwithintheith sample; the sum


SSW¼

X


i;j

ðxijxiÞ^2

¼


X


i

ðni 1 Þs^2 i

is called thewithin sum of squares.

264 COMPARISON OF POPULATION MEANS

Free download pdf