32
Validity
Validity is the extent to which an instrument measures what is intended. Validity can
be assessed qualitatively through an examination of instrument content, and quanti-
tatively through factor analysis and comparisons with related variables. As with
reliability, validity should not be seen as a fixed property and must be assessed in
relation to the specific population and measurement objectives.
Content and face validity assess whether items adequately address the domain of
interest [ 6 ]. They are qualitative matters of judging whether an instrument is suitable
for its proposed application. Face validity is concerned with whether an instrument
appears to be measuring the domain of interest. Content validity is a judgment about
whether instrument content adequately covers the domain of interest.
There is increasing evidence that items within instruments tend to be concen-
trated around the middle of the scale hierarchy, with relatively fewer items at the
extremes representing lower and higher levels of health. Instrument content should
be examined for relevance to the application and for adequate coverage of the
domain of interest.
Further evidence can be obtained from considering how the instrument was
developed. This includes the extent of involvement in instrument development of
experts with relevant clinical or health status measurement experience [ 44 ].
Validity testing should also involve some quantitative assessment. Criterion
validity is assessed when an instrument correlates with another instrument or mea-
sure that is regarded as a more accurate or criterion variable. Within the field of
patient-reported health measurement it is rarely the case that a criterion or “gold
standard” measure exists that can be used to test the validity of an instrument.
There are two exceptions. The first is when an instrument is reduced in length, with
the longer version used as the “gold standard” to develop the short version [ 16 ].
Scores for short and long versions of the instrument are compared, the objective
being a very high level of correlation. Secondly, instruments that have the measure-
ment objective of prediction have a gold standard available either concurrently or
in the future. For example, the criterion validity of an instrument designed to pre-
dict the presence of a particular disease (screening) can be assessed through a
comparison with the results of diagnosis or a prospective outcome like length of
hospital stay or mortality.
In the absence of a criterion variable, validity testing takes the form of construct
validation. PROMs are developed to measure some underlying construct such as
physical functioning or pain. On the basis of current understanding, such constructs
can be expected to have a set of quantitative relationships with other constructs. For
example, patients experiencing more severe pain may be expected to take more
analgesics. Construct validity is assessed by comparing the scores produced by an
instrument with sets of variables. Expected levels of correlation should be specified
at the outset of studies [ 45 ].
Many instruments are multidimensional and measure several constructs, includ-
ing physical functioning, mental health, and social functioning. These constructs
should be considered when assessing construct validity as should the expected
relationships with sets of variables. Furthermore, the internal structure of such
M. El Gaafary