490 Discrete Choice Modeling
constant. It is tempting to suggest that this measure measures the “contribution”
of the variables to the fit of the model. It is a statistic that lies between zero and
one, and it does rise unambiguously as variables are added to the model. However,
the “fit” aspect of the statistic is ambiguous, since the likelihood function is not a
fit measure. As a consequence, this measure can be distressingly small in a model
that contains numerous precisely measured (highly significant) coefficients (see
Wooldridge, 2002a, for discussion).
This does leave open the issue of how to assess the fit of the estimated model to
the data. In order to address this question, the analyst must first decide what rule
will be used to predict the observed outcome using the model, then determine how
successful the model (and rule) are. A natural approach, since the model predicts
probabilities of events, is to use the estimated probability,F(w′iθ). The prediction
is based on the rule:
Predictdi=1 if the estimated Prob(di= 1 |wi)is greater thanP, (11.3)
whereP∗is to be chosen by the analyst. The usual choice ofP∗is 0.5, reasoning
that if the model predicts that the event is more likely to occur than not, we should
predict that it will.^9 A summary 2×2 table of the number of cases in which the
rule predicts correctly and incorrectly can be used to assess the fit of the model.
Numerous single-valued functions of this tally have been suggested as counterparts
toR^2. For example, Cramer (1999) proposed:
λC=(averagePˆi|di= 1 )−(averagePˆi|di= 0 ).
This measure counts the correct predictions, and adds a penalty for incorrect pre-
dictions. Other modifications and similar alternatives have been suggested by Efron
(1978), Kay and Little (1986), Ben-Akiva and Lerman (1985) and Zavoina and
McKelvey (1975).
11.3.3 A Bayesian estimator
The preceding section has developed the classical MLE for binomial choice models.
A Bayesian estimator for the probit model illustrates an intriguing technique for
censored data models. The model framework is, as before:
di∗=w′iθ+εi,εi∼N[0, 1] (11.4)
di=1ifd∗i>0, otherwisedi=0. (11.5)
The data consist of (d,W)=(di,wi),i=1,...,n. The random variabledihas a
Bernoulli distribution with probabilities:
Prob[di= 1 |wi]=#(w′iθ)
Prob[di= 0 |wi]= 1 −#(w′iθ).
The likelihood function for the observed data,d, conditioned onWandθ, is:
L(d|W,θ)=
∏n
i= 1 [(w
′
iθ)]
di[ 1 −(w′
iθ)]
1 −di.