Logistic Regression: A Self-learning Text, Third Edition (Statistics in the Health Sciences)

(vip2019) #1
EXAMPLE (continued)
Alternative calculation of sub-area:
Rewrite
np¼ 100  200
as
(10 + 50 + 20 + 20) (2 + 48 + 50 + 100)
Classification information for different cutpoints (cp)

--c 0 =^1
c 1
c 2
c 3
c 4

0

00

0
0

0
1
1

1
1

10
50
20
20

2
48
50
100

X 1 X 2 P(X) Cases Noncases

ð 10 þ 50 þ 20 þ 20 Þð 2 þ 48 þ 50 þ 100 Þ

¼ (^20) tþ (^480) cþ (^500) cþ1,000c
þ (^100) dþ2,400tþ2,500cþ5,000c
þ (^40) dþ (^960) dþ1,000tþ2,000c
þ (^40) dþ (^960) dþ1,000dþ2,000t
Same values as in geometrical diagram
Twice each triangular area
(^20) t + 2,400t + 1,000t + 2,000t
(^480) c + (^500) c + 1,000c + 2,500c + 5,000c + 2,000c
= 11,480 concordant pairs (=w)
= 5,420 ties (=z)
(^100) dþ (^40) dþ (^960) dþ (^40) dþ (^960) dþ1,000d
¼3,100discordantpairs
Rescaled Area
100%
80%
60%
10%
0%1% 25% 50% 100%
Se
1 – Sp
600
1250 2500
1000
500
250
5 240 250 500
An alternative way to obtain the sub-area values
without having to geometrically calculate the
each subarea can be obtained by rewriting the
product formula for the total case/noncase pairs
as shown at the left.
Each term in the sum on the left side of this
product gives the number of cases with the
same predicted risk (i.e.,^PðXÞ) at one of the cut-
points used to form the ROC. Similarly each term
in the sum on the right side gives the number of
noncases with the sameP^ðXÞat each cut-point.
We then multiply the two partitioned terms in
the product formula to obtain 16 different
terms, as shown at the left. Those terms identi-
fied with the subscript “t” denote tied pairs,
those terms with the subscript “c” denote con-
cordant pairs, and those terms with the sub-
script “d” denote discordant pairs.
The six values with the subscript “c” are exactly
the same as the six concordant areas shown
in the geometrical diagram given earlier. The
sum of these six values, therefore, gives the
total areaunderthe ROC curve for concordant
pairs (i.e.,w).
The four values with the subscript “t” are
exactly twice the four triangular areas under
the ROC curve. Their sum therefore gives twice
the total tied pairs (i.e.,z)underthe ROC curve.
The remaining six terms identify portions of
the areaabovethe ROC curve corresponding
to discordant pairs. These are not used to com-
pute AUC.
Note that we can rescale the height and width
of the rectangle to 100%100%, which will
portray the dimensions of the rectangular area
in (Se 1 Sp) percent mode. To do this, the
value in each subarea under the curve needs to
be halved, as shown at the left.
364 10. Assessing Discriminatory Performance of a Binary Logistic Model

Free download pdf