46 K. Li and N.R. Prabhala
would be collinear.^7 However, under the assumption of bivariate normal errors,λC(.)
is a non-linear function. AsHeckman and Navarro-Lozano (2004)note, collinearity
between the outcome regression function (here and usually the linear functionXiβ) and
the selection “control” functionλC(.)is not a generic feature, so some degree of non-
linearity will probably allow the specification to be estimated even when there are no
exclusion restrictions.
In practice, the identification issue is less clear cut. The problem is that whileλC(.)
is a non-linear function, it is roughly linear in parts of its domain. Hence, it is entirely
possible thatλC(Z′γ)has very little variation relative to the remaining variables in
equation(10), i.e.,X. This issue can clearly arise when the selection variablesZand
outcome variablesXare identical. However, it is important to realize that merely having
extra instruments inZmay not solve the problem. The quality of the instruments also
matters. Near-multicollinearity could still arise when the extra instruments inZare
weak and have limited explanatory power.
What should one do if there appears to be a multicollinearity issue? It is tempting
to recommend that the researcher impose additional exclusion restrictions so that self-
selection instrumentsZcontain unique variables not spanned by outcome variablesX.
Matters are, of course, a little more delicate. Either the exclusions make sense, in which
case these should have been imposed in the first place. Alternatively, the restrictions are
not reasonable, in which case it hardly makes sense to force them on a model merely
to make it estimable. In any event, as a practical matter, it seems reasonable to always
run diagnostics for multicollinearity while estimating selection models whether one im-
poses exclusion restrictions or not.
The data often offer one degree of freedom that can be used to work around par-
ticularly thorny cases of collinearity. Recall that the identification issue arises mainly
because of the 1/0 nature of the selection variableWi, which implies that we do not
observe the error termηiand we must take its expectation, which is the inverse Mills
ratio term. However, if we could observe themagnitudeof the selection variableWi,we
would introduce an independent source of variation in the selection correction term and
in effect observe the private informationηiitself and use it in the regression in lieu of
the inverse Mills ratio. Exclusion restrictions are no longer needed. This is often more
than just a theoretical possibility. For instance, in analyzing a sample of firms that have
received a bank loan, we do observe the bank loan amount conditional on a loan being
made. Likewise, in analyzing equity offerings, we observe the fact that a firm made an
equity offering and also the size of the offer. In hedging, we do observe (an estimate
of) the extent of hedging given that a firm has hedged. This introduces an independent
source of variation into the private information variable, freeing one from the reliance
on non-linearity for identification.
(^7) In this case, having a variable inXthat is not part ofZdoes not help matters. IfλC(.)is indeed linear, it
is spanned byXwheneverZis spanned byX. Thus, we require extra variables that explain the decision to
self-select but are unrelated to the outcomes following self-selection.