Clinical Psychology

(Kiana) #1

context of the nature of the variable (a brief state or
temporary syndrome vs. a long-standing personality
trait) as well as the length of the intervening time
period between test and retest. When test–retest reli-
ability is low, this may be due to a host of factors,
including subjects’tendency to report fewer symp-
toms at retest, subjects’boredom or fatigue at retest,
or the effect of variations in mood on the report of
symptoms (Sher & Trull, 1996). Table 6-4 describes
reliability indices for structured interviews.
Table 6-5 presents a hypothetical data set from a
study assessing the reliability of alcoholism diagnoses
derived from a structured interview. This example
assesses interrater reliability (the level of agreement
between two raters), but the calculations would be
the same if one wanted to assess test–retest reliability.
In that case, the data for Rater 2 would be replaced
by data for Testing 2 (Retest). As can be seen, the
two raters evaluated the same 100 patients for the
presence/absence of an alcoholism diagnosis, using
a structured interview. These two raters agreed in
90% of the cases [(30 60)/100]. Agreement here
refers to coming to the same conclusion—not just
agreeing that the diagnosis is present but also that
the diagnosis is absent. Table 6-5 also presents the
calculation for kappa—a chance-corrected index of
agreement that is typically lower than overall agree-
ment. The reason for this lower value is that raters
will agree on the basis of chance alone in situations
where the prevalence rate for a diagnosis is relatively
high or relatively low. In the example shown in
Table 6-5, we see that the diagnosis of alcoholism
is relatively infrequent.
Therefore, a rater who always judged the disor-
der to be absent would be correct (and likely to agree
with another rater) in many cases. The kappa


coefficient takes into account such instances of agree-
ment based on chance alone and adjusts the agree-
ment index (downward) accordingly. In general, a
kappa value between .75 and 1.00 is considered to
reflect excellent interrater agreement beyond chance
(Cicchetti, 1994).

Validity

The validity of any type of psychological measure
can take many forms.Content validityrefers to the
measure’s comprehensiveness in assessing the vari-
able of interest. In other words, does it adequately

T A B L E 6-5 Diagnostic Agreement Between
Two Raters
Rater 2
Present Absent
Present
30 5
Rater 1 a b
Absent
560
cd
N 100
Overall Agreement a d/N .90

Kappa
a dN a b a c c d b d N^2
1 a b a c c d b d N^2
ad bc
ad bc Nb c 2
1775
2275
.78

T A B L E 6-4 Common Types of Reliability That Are Assessed to Evaluate Interviews


Type of Reliability Definition Statistical Index


Interrater or
interjudge reliability


Index of the degree of agreement between two or more
raters or judges as to the level of a trait that is present or
the presence/absence of a feature or diagnosis

Pearson’sr
Intraclass correlation Kappa

Test–retest reliability Index of the consistency of interview scores across some
period of time


Pearson’sr
Intraclass correlation

186 CHAPTER 6

Free download pdf