Clinical Psychology

(Kiana) #1

interrater reliability. It can be quantified in many ways,
including thekappa coefficient(Cohen, 1960) or the
intraclass correlation coefficient (Shrout & Fleiss,
1979).
The validity of an interview concerns how well
the interview measures what it intends to measure.
For example, a demonstration that scores from a de-
pression interview correlate highly with scores from
a well-respected self-report measure of depression
wouldsuggestthereissomedegreeofvalidityin
the use of this interview’s scores to assess depression.
Evidence for an interview’spredictive validitywould be
demonstrated if scores from this measure were signif-
icantly correlated with (and therefore“predicted”)
future events believed to be relevant to that con-
struct. For example, if scores from our depression
interview were highly correlated with poorer aca-
demic performance over the next 2 months, then
we might say we have evidence supporting the pre-
dictive validity of our interview.
As should be apparent, both the reliability and
validity of a measure, such as an interview, are a
matter of degree. Scores from interviews, like those
from psychological tests, are neither perfectly reliable
nor perfectly valid. But the higher the reliability and
validity, the more confident we are in our conclu-
sions. Let us turn now to look more closely at reli-
ability and validity issues regarding interviews.


Reliability

Standardized (structured) interviews with clear scor-
ing instructions will be more reliable than unstruc-
tured interviews. The reason is that structured
interviews reduce both information variance and cri-
terion variance. Information variance refers to the
variation in the questions that clinicians ask, the
observations that are made during the interview,
and the method of integrating the information that
is obtained (Rogers, 1995). Criterion variance refers
to the variation in scoring thresholds among clini-
cians (Rogers, 1995). Clear-cut scoring guidelines
make it more likely that two clinicians will score
the same interviewee response in a similar way.
Because most of the research on the psychomet-
ric properties of interviews has focused on structured


diagnostic interviews, we discuss these in some detail.
For many years, diagnostic interviews were consid-
ered quite unreliable (Matarazzo, 1983; Ward et al.,
1962). However, several things changed. First, with
the introduction ofDSM-III(American Psychiatric
Association, 1980), operational criteria were devel-
oped for most of the mental disorder diagnoses. This
made it much easier to know what features to assess
in order to rule in or rule out a particular mental
disorder diagnosis. Second, and perhaps more impor-
tant, several groups of investigators developed struc-
tured interviews to systematically assess the various
DSMcriteria for mental disorders. Clearly, the reli-
ability of the diagnostic information derived from
structured interviews exceeds that obtained from
unstructured interviews (Rogers, 1995).
As previously mentioned, the most common
type of reliability assessed and reported for struc-
tured diagnostic interviews is interrater reliability.
Another measure of reliability that is examined in
structured diagnostic interviews, as well as other
interviews, is test–retest reliability—the consistency
of scores or diagnoses across time. We expect that,
in general, individuals should receive similar scores
or diagnoses when an interview is readministered.
For example, a patient assigned a diagnosis of major
depressive disorder based on a structured interview
would be expected to receive the same diagnosis if
reinterviewed (using the same structured interview)
the next day. We expect the test–retest reliability of
an interview to be quite high when the intervening
time period between the initial testing and the
retest is short (hours or a few days). However,
when the intervening time period is long (months
or years), test–retest reliability typically suffers. One
reason—especially when assessing“current”mental
disorder diagnoses—is that the psychological status
of the patient may have changed. For example, the
fact that a patient does not again receive a major
depressive disorder diagnosis at 6-month retest is
not necessarily an indictment of our structured
interview. Because major depressive episodes can
be of relatively short duration, our interview may
be quite accurate in revealing no diagnosis at retest.
The point is that the level of test–retest reliabil-
ity that is obtained must be interpreted in the

THE ASSESSMENT INTERVIEW 185
Free download pdf