Introductory Biostatistics

(Chris Devlin) #1

3.1.5 Measuring Agreement


Many research studies rely on an observer’s judgment to determine whether a
disease, a trait, or an attribute is present or absent. For example, results of ear
examinations will certainly have e¤ects on a comparison of competing treat-
ments for ear infection. Of course, the basic concern is the issue of reliability.
Sections 1.1.2 and 3.1.4 dealt with an important aspect of reliability, the valid-
ity of the assessment. However, to judge a method’s validity, an exact method
for classification, orgold standard, must be available for the calculation of
sensitivity and specificity. When an exact method isnotavailable, reliability can
only be judgedindirectlyin terms ofreproducibility; the most common way for
doing that is measuring the agreement between examiners.
For simplicity, assume that each of two observers independently assigns
each ofnitems or subjects to one of two categories. The sample may then be
enumerated in a 22 table (Table 3.4) or in terms of the cell probabilities
(Table 3.5). Using these frequencies, we can define:



  1. An overall proportion ofconcordance:



n 11 þn 22
n


  1. Category-specific proportions of concordance:


C 1 ¼

2 n 11
2 n 11 þn 12 þn 21

C 2 ¼
2 n 22
2 n 22 þn 12 þn 21

TABLE 3.4


Observer 2

Observer 1 Category 1 Category 2 Total


Category 1 n 11 n 12 n 1 þ
Category 2 n 21 n 22 n 2 þ


Total nþ 1 nþ 2 n


TABLE 3.5


Observer 2

Observer 1 Category 1 Category 2 Total


Category 1 p 11 p 12 p 1 þ
Category 2 p 21 p 22 p 2 þ


Total pþ 1 pþ 2 1.0


118 PROBABILITY AND PROBABILITY MODELS

Free download pdf