David T. Jacho-Chávez and Pravin K. Trivedi 809
which is usually inconsistent with the data. The negative binomial regression is
often the next step because it can accommodate overdispersion in the data. As a
common and plausible explanation of overdispersion comes from the presence of
heterogeneity in the data, another approach to the problem is to allow for the pos-
sibility that a better specification is a two-component Poisson mixture, with each
component corresponding to one type of individual. The results for this specifica-
tion are presented in the next two columns. A further generalization is to allow
for the possibility that the distribution is a two-component mixture of negative
binomial (NB2) distributions with a quadratic variance function. The advantage
here is that the NB2 specification allows for within-group heterogeneity as well.
Thus each component represents the behavior of one group of individuals, but
also allows for within-group heterogeneity.
Because the three models are nested, the log-likelihoods are comparable and
the likelihood ratio can be used to test the restrictive models against the general
2-component mixture of NB2 distributions. Clearly, the FM2 specification of the
NB2 distribution is the best-fitting model. The two components correspond to low
users (around 40.5% of the population) and high users (around 59.5% of the pop-
ulation) of doctor visits. Evaluating the conditional means of distributions at the
sample average of the predictors, the average is 3.92 for the first group and 8.62
for the second group. The groups also differ in their sensitivity to variations in
the predictors. Of course, we have deliberately used a very simple specification
so the exact numbers are only illustrative. The important point is that, although
the mixture models are harder to estimate, especially when the number of com-
ponents is increased, they are much more informative about the heterogeneity in
the population.
15.6 Simulation-based maximum likelihood
In this section we consider an application of the MSL estimator to a nonlinear
model with discrete outcomes and endogenous dummy regressors. There are two
well-established approaches, limited information and full information, for han-
dling endogeneity in linear models. Implementation of full information methods,
based on the joint distribution of all endogenous variables, is often harder to imple-
ment because closed-form expressions for the joint distribution are rarely available.
Thus there is strong motivation for limited-information methods based on instru-
mental variables, such as the GMM and two-stage sequential estimation. These
have been extended to nonlinear models in a number of special cases, sometimes
on anad hoccomputationally feasible basis, though not always with supporting
formal justification. However, the consistency property of the two-step estimator
may depend on particular assumptions about the structure of dependence. For
example, in discrete outcome models sequential two-step estimation, based on the
replacement of an endogenous variable by a fitted value, yields a consistent esti-
mator only if the causal structure is recursive (Blundell and Powell, 2004; Chesher,
2005).