variable (e.g. using logistic regression) on a set of covariates. Rosenbaum and
Rubin show that, in large samples, the propensity score approach ensures that ‘if
treatment and control groups have the same distribution of propensity scores, they
have the same distribution of all observed covariates, just like in a randomized
experiment’ (Rubin 2001 : 171 ). (Consistent with Cochran’s example, Heinsman and
Shadish 1996 found that the size of pretest diVerences between groups was one of
the most important factors in leading to diVerentWndings from experiments versus
non-experiments.)
The advantage of using propensity scores over matching on multiple covariates
is that the latter can become unwieldly as the number of covariates increases.
Compared to simply including covariates in a regression model, propensity scores
also use fewer degrees of freedom. Also important is that propensity scores, when
used as recommended, are used only to compare groups and subgroups with very
similar propensity scores, thus addressing the concern raised in the Cochran ( 1957 )
quote above.
The limitations of propensity scores are similar to those for the classical regres-
sion of ANCOVA model. Although only similar propensity scores are compared,
these continue to be based only on known/observable covariates. Any non-
response is assumed to be random/ignorable (Rubin 1976 ) and it is also assumed
that the treatment variable is exogenous to the outcome or Y variable. Another
assumption is that the responses in one treatment group are not aVected by the
treatment received by respondents in other groups (e.g. as in the case where the
groups compete for resources or access to treatment), referred to as the stable unit-
treatment value assumption (SUTVA, Rosenbaum and Rubin 1983 ). Finally,
although they can be used in the case of a continuous treatment variable (for an
example, see Hirano and Imbens 2004 ), propensity scores have been primarily used
in cases where the independent variable is categorical (as in an ANOVA design).
The propensity score approach is currently receiving a great deal of attention in
econometrics and has been used in medical research as well. Some evidence
suggests that propensity scores can produce treatment eVect estimates from
quasi-experimental designs close to those of randomized experiments (Dehejia
and Wahba 1999 ; Hirano et al. 2003 ). Not surprisingly, the method works best when
characteristics of the non-equivalent groups are similar and when variables are
measured in the same way for both groups (Heckman et al. 1997 ), but there is a
debate regarding how typical that situation is and thus how broadly useful pro-
pensity score matching is (see point-counterpoint,Journal of Econometrics, March/
April 2005 ). To date, there appear to be no applications of propensity scores in the
management literature.
(e.g. training, no training) may diVer from experienced treatment (e.g. some in training condition
may not attend, whereas some not in the training condition may gain knowledge of what is taught).
An application of this model to the HR performance literature may be where intended HR policy
diVers from experienced HR policy.
modeling hrm and performance linkages 563