William Greene 495
This form of the moment equation, based on observables, can form the basis of a
straightforward two-step generalized method of moments (GMM) estimator.
The GMM estimator is not less parametric than the full information MLE
described below, because the probit model based on the normal distribution is
still invoked to specify the moment equation.^14 Nothing is gained in simplicity or
robustness compared to full information maximum likelihood estimation, which
we now consider. (As Bertschek and Lechner, 1998, argue, however, the gains
might come in terms of practical implementation and computation time. The same
considerations motivated Avery, Hansen and Hotz, 1983.)
The MLE requires a full specification of the model, including the assumption that
underlies the endogeneity ofzi. This becomes essentially a simultaneous equations
model. The model equations are:
d∗i=xi′β+γzi+εi,di= 1 [d∗i> 0 ],
zi=w′iα+ui,
(εi,ui)∼N
[(
0
0
)
,
(
1 ρσu
ρσu σu^2
)]
.
(11.6)
(We are assuming that there is a vector of instrumental variables,wi.) Probit esti-
mation based ondiand (xi,zi)will not consistently estimate (β,γ)because of the
correlation betweenziandεiinduced by the correlation betweenuiandεi. Several
methods have been proposed for estimation of this model. One possibility is to use
the partial reduced form obtained by inserting the second equation into the first.
This becomes a probit model with probability Prob(di= 1 |xi,wi)=#(x′iβ∗+w′iα∗).
This will produce consistent estimates ofβ∗ = β/( 1 +γ^2 σ^2 + 2 γσρ)^1 /^2 and
α∗=γα/( 1 +γ^2 σ^2 + 2 γσρ)^1 /^2 as the coefficients onxiandwi, respectively. (The
procedure will estimate a mixture ofβ∗andα∗for any variable that appears in both
xiandwi.) In addition, linear regression ofzionwiproduces estimates ofαandσ^2 ,
but there is no method of moments estimator ofρorγproduced by this procedure,
so this estimator is incomplete. Newey (1987) suggested a “minimum chi-squared”
estimator that does estimate all parameters. A more direct, and actually simpler,
approach is full information maximum likelihood.
The log-likelihood is built up from the joint density ofdiandzi, which we write
as the product of the conditional and the marginal densities:
f(di,zi)=f(di|zi)f(zi).
To derive the conditional distribution, we use results for the bivariate normal, and
write:
εi|ui=[(ρσ )/σ^2 ]ui+vi,
whereviis normally distributed with Var[vi]=( 1 −ρ^2 ). Inserting this in the first
equation of equation (11.6), we have:
d∗i|zi=x′iβ+γzi+(ρ/σ )ui+vi.