Regression Models with Categorical Variables 117
If Di = 0, the data Xi 1 belong to the first category; if Di = 1, the data Xi 1
belong to the second category.
Consider now the regression equation in matrix form Y = XB
=+
=β+β+ε
=β+β+ε
YXBE
YX
YXTTT
101111
011
(6.1)
In financial econometric applications, the index i will be time or a variable
that identifies a cross section of assets, such as bond issues. Consider that we
can write three separate regression equations, one for those data that corre-
spond to D = 1, one for those data that correspond to D = 0, and one for the
fully pooled data. Suppose now that the three equations differ by the intercept
term but have the same slope. Let’s explicitly write the two equations for those
data that correspond to D = 1 and for those data that correspond to D = 0:
(^) y
XD
i X
ii
i
=
++ =
++
ββ ε
ββ ε
00 111
01 111
0,
,
if
ifDi=
^1
(6.2)
where β 00 and β 01 are the two intercepts and i defines the observations that
belong to the first category when the dummy variable D assumes value 0 and
also defines the observations that belong to the second category when the
dummy variable D assumes value 1. If the two categories are recession and
expansion, the first equation might hold in periods of expansion and the sec-
ond in periods of recession. If the two categories are investment-grade bonds
and noninvestment-grade bonds, the two equations apply to different cross
sections of bonds, as will be illustrated in an example later in this chapter.
Observe now that, under the assumption that only the intercept term
differs in the two equations, the two equations can be combined into a
single equation in the following way:
(^) YDii=+βγ 00 ()iX++βε 1 i (6.3)
where γ = β 01 – β 00 represents the difference of the intercept for the two
categories. In this way we have defined a single regression equation with
two independent quantitative variables, X, D, to which we can apply all the
usual tools of regression analysis, including the ordinary least squares (OLS)
estimation method and all the tests. By estimating the coefficients of this
regression, we obtain the common slope and two intercepts. Observe that
we would obtain the same result if the categories were inverted.