Applied Statistics and Probability for Engineers

(Chris Devlin) #1
450 CHAPTER 12 MULTIPLE LINEAR REGRESSION

12-6.2 Categorical Regressors and Indicator Variables

The regression models presented in previous sections have been based on quantitativevari-
ables, that is, variables that are measured on a numerical scale. For example, variables such as
temperature, pressure, distance, and voltage are quantitative variables. Occasionally, we need
to incorporate categorical,or qualitative,variables in a regression model. For example, sup-
pose that one of the variables in a regression model is the operator who is associated with each
observation yi. Assume that only two operators are involved. We may wish to assign different
levels to the two operators to account for the possibility that each operator may have a differ-
ent effect on the response.
The usual method of accounting for the different levels of a qualitative variable is to use
indicator variables.For example, to introduce the effect of two different operators into a
regression model, we could define an indicator variable as follows:

In general, a qualitative variable with r-levels can be modeled by r  1 indicator variables,
which are assigned the value of either zero or one. Thus, if there are threeoperators, the
different levels will be accounted for by the twoindicator variables defined as follows:

x 1 x 2
if the observation is from operator 1
if the observation is from operator 2
if the observation is from operator 3

Indicator variables are also referred to as dummyvariables. The following example [from
Montgomery, Peck, and Vining (2001)] illustrates some of the uses of indicator variables; for
other applications, see Montgomery, Peck, and Vining (2001).

EXAMPLE 12-12 A mechanical engineer is investigating the surface finish of metal parts produced on a lathe
and its relationship to the speed (in revolutions per minute) of the lathe. The data are shown in
Table 12-14. Note that the data have been collected using two different types of cutting tools.
Since the type of cutting tool likely affects the surface finish, we will fit the model

where Yis the surface finish, x 1 is the lathe speed in revolutions per minute, and x 2 is an
indicator variable denoting the type of cutting tool used; that is,

The parameters in this model may be easily interpreted. If x 2 0, the model becomes

which is a straight-line model with slope  1 and intercept  0. However, if x 2 1, the model
becomes

Y 0  1 x 1  2112  1  0  22  1 x 1 

Y 0  1 x 1 

x 2 e

0, for tool type 302
1, for tool type 416

Y 0  1 x 1  2 x 2 

0 1

1 0

0 0

xe

0 if the observation is from operator 1
1 if the observation is from operator 2

c 12 .qxd 5/20/02 9:34 M Page 450 RK UL 6 RK UL 6:Desktop Folder:TEMP WORK:MONTGOMERY:REVISES UPLO D CH114 FIN L:Quark Files:

Free download pdf