482 Discrete Choice Modeling
We can approach the specification in equation (11.2) from a different viewpoint.
The random utility approach specifies thatdi∗represents the strength of the individ-
ual’s preference for alternative 1 relative to alternative 2. An alternative approach
regards (11.2) as alatent regression model. The dependent variable is assumed to be
unobservable; the observation is a censored variable that measuresdi∗relative to
a benchmark, zero. For an example, consider a model of loan default. One would
not typically think of loan default as a utility maximizing choice. On the other
hand, in the context of (11.2), one might think ofdi∗as a latent measure of the
financial distress of individuali.Ifdi∗is high enough, the individual defaults, and
we observedi=1. By this construction, the appropriate model fordiis acensored
regression. Once we endowεiwith a proper probability distribution, (11.2) can be
construed as a regression model.
With the assumption of a specific distribution forεi, we obtain a statement of
the choice probabilities:
Prob(di= 1 |Xi,zi)=Prob(d∗i> 0 |Xi,zi)
=Prob(x′iβ+z′iγ+εi> 0 ).
=Prob[εi>−(x′iβ+z′iγ)]
= 1 −Prob[εi≤−(x′iβ+z′iγ)].
It follows that:
E[di|Xi,zi]= 0 ×Prob(di= 0 |Xi,zi)+ 1 ×Prob(di= 1 |Xi,zi)
=Prob(di= 1 |Xi,zi),
so we now have a regression model to manipulate as well. The implied probability
endowed by our assumption of the distribution ofεibecomes the regression ofdi
onXiandzi. By this construction, one might bypass the random utility apparatus,
and simply embark on modeling:
di=E[di|Xi,zi]+ai
=Prob(di= 1 |Xi,zi)+ai,
where, by construction,aihas zero mean, conditioned on the probability function.
A remaining step is to construct the appropriate conditional mean function. This
specification has suggested in some settings thelinear probability model:
di=x′iβ+z′iγ+ai.
(See, e.g., Aldrich and Nelson, 1984; Caudill, 1988; Heckman and Snyder, 1997;
Angrist, 2001.) The linear probability model has some significant shortcomings,
the most important of which is that the linear function cannot be constrained to
lie between zero and one, so its interpretation as a probability model is suspect.
With few exceptions, including those noted above, researchers have employed