Clinical Psychology

interrater reliability. It can be quantified in many ways,
including thekappa coefficient(Cohen, 1960) or the
intraclass correlation coefficient (Shrout & Fleiss,
1979).
The validity of an interview concerns how well
the interview measures what it intends to measure.
For example, a demonstration that scores from a de-
pression interview correlate highly with scores from
a well-respected self-report measure of depression
wouldsuggestthereissomedegreeofvalidityin
the use of this interview’s scores to assess depression.
Evidence for an interview’spredictive validitywould be
demonstrated if scores from this measure were signif-
icantly correlated with (and therefore“predicted”)
future events believed to be relevant to that con-
struct. For example, if scores from our depression
interview were highly correlated with poorer aca-
demic performance over the next 2 months, then
we might say we have evidence supporting the pre-
dictive validity of our interview.
As should be apparent, both the reliability and
validity of a measure, such as an interview, are a
matter of degree. Scores from interviews, like those
from psychological tests, are neither perfectly reliable
nor perfectly valid. But the higher the reliability and
validity, the more confident we are in our conclu-
sions. Let us turn now to look more closely at reli-
ability and validity issues regarding interviews.

Reliability

Standardized (structured) interviews with clear scor-
ing instructions will be more reliable than unstruc-
tured interviews. The reason is that structured
interviews reduce both information variance and cri-
terion variance. Information variance refers to the
variation in the questions that clinicians ask, the
observations that are made during the interview,
and the method of integrating the information that
is obtained (Rogers, 1995). Criterion variance refers
to the variation in scoring thresholds among clini-
cians (Rogers, 1995). Clear-cut scoring guidelines
make it more likely that two clinicians will score
the same interviewee response in a similar way.
Because most of the research on the psychomet-
ric properties of interviews has focused on structured

diagnostic interviews, we discuss these in some detail. For many years, diagnostic interviews were consid- ered quite unreliable (Matarazzo, 1983; Ward et al., 1962). However, several things changed. First, with the introduction ofDSM-III(American Psychiatric Association, 1980), operational criteria were developed for most of the mental disorder diagnoses. This made it much easier to know what features to assess in order to rule in or rule out a particular mental disorder diagnosis. Second, and perhaps more impor- tant, several groups of investigators developed structured interviews to systematically assess the various DSMcriteria for mental disorders. Clearly, the reliability of the diagnostic information derived from structured interviews exceeds that obtained from unstructured interviews (Rogers, 1995). As previously mentioned, the most common type of reliability assessed and reported for structured diagnostic interviews is interrater reliability. Another measure of reliability that is examined in structured diagnostic interviews, as well as other interviews, is test–retest reliability—the consistency of scores or diagnoses across time. We expect that, in general, individuals should receive similar scores or diagnoses when an interview is readministered. For example, a patient assigned a diagnosis of major depressive disorder based on a structured interview would be expected to receive the same diagnosis if reinterviewed (using the same structured interview) the next day. We expect the test–retest reliability of an interview to be quite high when the intervening time period between the initial testing and the retest is short (hours or a few days). However, when the intervening time period is long (months or years), test–retest reliability typically suffers. One reason—especially when assessing“current”mental disorder diagnoses—is that the psychological status of the patient may have changed. For example, the fact that a patient does not again receive a major depressive disorder diagnosis at 6-month retest is not necessarily an indictment of our structured interview. Because major depressive episodes can be of relatively short duration, our interview may be quite accurate in revealing no diagnosis at retest. The point is that the level of test–retest reliability that is obtained must be interpreted in the

THE ASSESSMENT INTERVIEW 185

Clinical Psychology

Reliability

Get our desktop app

Company

Features

Documentation

Resources