484 Discrete Choice Modeling
The log-likelihood function for the observed data is:
lnL=
∑n
i= 1
lnProb(di|xi,zi)
=
∑
di= 1
ln Prob(di= 1 |xi,zi)+
∑
di= 0
ln Prob(di= 0 |xi,zi)
=
∑n
i= 1 lnF[(^2 di−^1 )(x
′
iβ+z
′
iγ)].
Estimation by maximizing the log-likelihood is straightforward for this model. The
gradient of the log-likelihood is:
∂lnL
∂
(
β
γ
)=
∑n
i= 1
( 2 di− 1 )
F′[( 2 di− 1 )(x′iβ+z′iγ)]
F[( 2 di− 1 )(x′iβ+z′iγ)]
(
xi
zi
)
=
∑n
i= 1
gi=g.
The maximum likelihood estimators of the parameters are found by equatinggto
zero, an optimization problem that requires an iterative solution.^6 For convenience
in what follows, we will define:
qi=( 2 di− 1 ),wi=
(
xi
zi
)
,θ=
(
β
γ
)
,ti=qiw′iθ,Fi=F(ti),F′i=dFi/dti=fi.
(Thus,Fiis the cumulative density function (c.d.f.) andfiis the density for the
assumed distribution.) It follows that:
gi=qiFi′(ti)wi=qifiwi.
Statistical inference about the parameters is made using one of the three con-
ventional estimators of the asymptotic covariance matrix: the Berndt, Hall, Hall
and Hausman (BHHH) (1974) estimator, based on the outer products of the first
derivatives:
VBHHH=
[∑n
i= 1 gig
′
i
]− 1
,
the actual Hessian:
VH=
[
−
∑n
i= 1
∂^2 lnL
∂θ∂θ′
]− 1
=
[
−
∑n
i= 1
FiF′′i−(Fi′)^2
Fi^2
,wiw′i
]− 1
,
or the expected Hessian, which can be shown to equal:
VEH=
[
−
∑n
i= 1 Edi
(
∂^2 lnL
∂θ∂θ′
)]− 1
=
[
−
∑n
i= 1
f(w′iθ)f(−w′iθ)
Fi( 1 −Fi)
wiw′i
]− 1
.
It has become common, evende rigueur, to compute a “robust” covariance matrix
for the MLE usingVH×V−BHHH^1 ×VH, under the assumption that the MLE is robust
to failures of the specification of the model. In fact, there is no obvious failure of