Regression Models with Categorical Variables 119
We can introduce the dummy D as well as its interaction with the N quan-
titative variable and thus write the following equation:
YDiiijXDij X
j
N
ij iij
j
N
=+ ++ + i
==
βγ 0 ∑∑βδ ε
11
() (6.8)
The above discussion depends critically on the fact that there are only
two categories, a fact that allows one to use the numerical variable 0,1 to
identify the two categories. However, the process can be easily extended to
multiple categories by adding dummy variables. Suppose there are K > 2
categories. An explanatory variable that distinguishes between more than
two categories is called a polytomous variable.
Suppose there are three categories, A, B, and C. Consider a dummy variable
D1 that assumes a value one on the elements of A and zero on all the others.
Let’s now add a second dummy variable D2 that assumes the value one on the
elements of the category B and zero on all the others. The three categories are
now completely identified: A is identified by the values 1,0 of the two dummy
variables, B by the values 0,1, and C by the values 0,0. Note that the values 1,1
do not identify any category. This process can be extended to any number of
categories. If there are K categories, we need K – 1 dummy variables.
statistical tests
How can we determine if a given categorization is useful? It is quite obvi-
ous that many categorizations will be totally useless for the purpose of
any econometric regression. If we categorize bonds in function of the
color of the logo of the issuer, it is quite obvious that we obtain meaning-
less results. In other cases, however, distinctions can be subtle and impor-
tant. Consider the question of market regime shifts or structural breaks.
These are delicate questions that can be addressed only with appropriate
statistical tests.
A word of caution about statistical tests is in order. As observed in Chap-
ter 2, statistical tests typically work under the assumptions of the model and
might be misleading if these assumptions are violated. If we try to fit a linear
model to a process that is inherently nonlinear, tests might be misleading. It
is good practice to use several tests and to be particularly attentive to incon-
sistencies between test results. Inconsistencies signal potential problems in
applying tests, typically model misspecification.
The t-statistic applied to the regression coefficients of dummy vari-
ables offer a set of important tests to judge which regressors are significant.
Recall from Chapter 2 that the t-statistics are the coefficients divided by