406 Statistical Methods
the mean price in Chicago is $27.25 above the mean in San Francisco. The
reason for the difference is an outlier in the sample of Chicago room prices.
This outlier has a big effect on the Chicago mean price, but not on the me-
dian. The median is much more robust to the effect of outliers.
One-Way Analysis of Variance and Regression
You can think of analysis of variance as a special form of regression. In the
case of analysis of variance, the predictor variables are discrete rather than
continuous. Still, you can express an analysis of variance in terms of regres-
sion and, in doing so, can get additional insights into the data. To do this,
you have to reformulate the model.
Earlier in this chapter you were introduced to the means model
y5mi1e
for the ith treatment group. An equivalent way to express this relationship
is with the effects model
y5m1ai1e
Here m is a mean term, ai is the effect from the ith treatment group, and e
is a normally distributed error term with mean 0 and variance s^2.
Let’s apply this equation to the hotel data. In this data set there are four
groups representing the four cities, so you would expect the effects model
to have a mean term m and four effect terms a 1 , a 2 , a 3 and a 4 representing
the four cities. There is a problem, however: You have fi ve parameters in
your model, but you are estimating only four mean values. This is an ex-
ample of an overparametrized model, where you have more parameters
than response values. As a result, an infi nite number of possible values
for the parameters will solve the equation. To correct this problem, you
have to reduce the number of parameters. Statistical packages generally
do this in one of two ways: Either they constrain the values of the effect
terms so that the sum of the terms is zero, or they defi ne one of the effect
terms to be zero (Milliken and Johnson, 1984). Let’s apply this second ap-
proach to the hotel data and perform the analysis of variance using regres-
sion modeling.
Indicator Variables
To perform the analysis of variance using regression modeling, you can cre-
ate indicator variables for the data. Indicator variables take on values of
either 1 or 0, depending on whether the data belong to a certain treatment
group or not. For example, you can create an indicator variable where the