14
rheumatology. Consequently, the majority of papers on measurement properties of
PROs in rheumatology utilize CTT approaches to reliability.
Key shortcomings of CTT as a test theory for PROs are that CTT scores are
instrument-dependent and apply only to the total score of a specific measure. For
instance, if we would be interested in measuring physical disability in a certain
population of patients, we may choose from a number of available PROs for this
purpose. However, each individual PRO is associated with its own true score, since
individual PROs tend to have idiosyncratic items, response options, and scoring
procedures. Consequently, scores can only be compared between studies or over
time if the exact same instrument is administered. This has led to the situation that
only limited different PROs are typically used for a given purpose in a field and that
it is difficult to deal with the phenomenon of floor and ceiling effects. Similarly,
many item and person parameters derived from CTT are also instrument-dependent.
For instance, it is easy to see that the reliability coefficient defined previously would
change as a function of the true score variance if measurement error remains
constant.
Item Response Theory
Item response theory (IRT) is a statistical framework that allows a more flexible
approach to assigning PRO scores. IRT was developed to overcome measurement
problems with CTT and is increasingly utilized in the development of new instru-
ments or to facilitate the comparability between existing ones. In IRT, the focus
moves from the scale level to the level of individual items. Individual items are
described by item characteristic functions, which give the probability that a patient
will give a certain response to a PRO item as a monotonically increasing function
of a patient’s overall level of the measured trait. Most typically, the logistic function
is used as an elemental unit to trace the conditional probability that a patient will
respond in category x, rather than x − 1, given that the response was in x or x − 1. The
main advantage of IRT over the classical approach is that its item level models pro-
vide item and person parameters that are invariant with respect to the population
that was used to obtain them. This means that the parameters that characterize the
item do not depend on the ability distribution of the sample that was used to esti-
mate them and conversely that the parameter(s) that characterize(s) a patient do(es)
not depend on the specific set of items that were used to estimate them. Therefore,
once the item parameters of a set of items have been estimated, an estimate of the
location of a responding patient on the underlying distribution of the measured trait
may be obtained from any subset of the calibrated items, and researchers are no
longer bound to static questionnaires in order to obtain information regarding a
PRO domain. It should be noted though that the latent scale in IRT models itself is
arbitrary, so that item parameters of unequated item banks are invariant only up to
a set of linear transformations [ 50 ]. Calibrated item banks can be used to create
more efficient measures by administering only the most relevant questions for
M.A.H.O. Voshaar and M.A.F.J. van de Laar