- 2 Selection Bias Correction
Sample selection bias refers to the case where observations above or below a certain
threshold onyare not observed. The result is bias in the regression coeYcient.
(Similar selection on the independent variable does not ordinarily cause this
problem. Rather, it is a range restriction problem, as discussed earlier.) Heckman
( 1979 : 155 ) shows that sample selection bias is a special case of omitted variable bias.
Thus, as noted below, the correction involves creating a new variable to add to the
model.
In the HR and performance literature, sample selection bias may arise, for
example, becauseWrms or plants having less eVective HR strategies may be less
likely to survive than those having more eVective HR strategies (Gerhart et al.
1996 ). The consequence of observing onlyWrms/plants that survive might be a
downward bias in the estimate of the HR and performance relationship.
The Heckman two-step correction procedure (Heckman 1979 ) estimates a selec-
tion equation and a substantive equation. The selection equation (often using
probit or logit) models the probability that observations from a population are
included in the sample at hand. Based on the selection equation, the inverse Mills
ratio, the probability of being excluded from the sample, is estimated and then
added as a variable to the substantive equation, which here would be the relation-
ship between HR and performance. (See Berk 1983 for a primer on selection bias.)
Whether this ‘correction’ produces improved estimates depends, however, on the
speciWc characteristics of the data and the nature of the estimates obtained from the
selection equation (Stolzenberg and Relles 1997 ). Thus, whenever a selection bias
correction is used, it is necessary to report the full selection equation results
(variables, coeYcients, andWt).
- 3 Fixed EVects
AWxed eVects model can be considered where there are longitudinal data on both
HR and performance, as well as suYcient variance in changes in HR and perform-
ance over time. This estimator is also known as the dummy variable, within-
subjects, orWrst diVerence (in the special case of two time periods only) estimator.
To see the potential advantage, specify equations for the relationship between HR
and performance for timet 1 and timet, respectively withbt 1 ¼bt:
perft¼hrtbþet
perft 1 ¼hrt 1 bþet 1
Decompose the residuals into a time-varying,v, and time-invariant,u, parts:
perft¼hrtbþvtþu
564 b a r r y g e r h a r t