Regression Models with Categorical Variables 137
Dependent Categorical Variables
Thus far we have discussed models where the explanatory variables can be
either quantitative or categorical while the dependent variable is quantita-
tive. Let’s now discuss models where the dependent variable is categorical.
Recall that a regression model can be interpreted as a conditional prob-
ability distribution. Suppose that the dependent variable is a categorical
variable Y that can assume two values, which we represent conventionally
as 0 and 1. The probability distribution of the dependent variable is then a
discrete function:
P
P
()
()
Yp
Yqp
==
===−
1
01
A regression model where the dependent variable is a categorical vari-
able is therefore a probability model; that is, it is a model of the probability
p given the values of the explanatory variables X:
PX()Yf== 1 ()X
In the following sections we will discuss three probability models: the lin-
ear probability model, the probit regression model, and the logit regression
model.
linear probability Model
The linear probability model assumes that the function f(X) is linear. For
example, a linear probability model of default assumes that there is a linear
relationship between the probability of default and the factors that deter-
mine default:
PX()Yf== 1 ()X
The parameters of the model can be obtained by using ordinary least
squares applying the estimation methods of multiple regression models dis-
cussed in the previous chapter. Once the parameters of the model are esti-
mated, the predicted value for P(Y) can be interpreted as the event probability
such as the probability of default in our previous example. Note, however,
that when using a linear probability model, the R^2 is used as described in the
previous chapter only if all the explanatory variables are also binary variables.