We then fit a logistic model P(X) to this data
set, and we compute the predicted probability
of being a case, i.e.,P^ðXiÞ, for each of the 300
subjects.
For this dataset, thetotal number of possible
case/noncase pairs(i.e.,np) is the product 100
200, or 20,000.
We now letwdenote the number of these pairs
for which^PðXÞfor the case is larger thanP^ðXÞ
for the corresponding control. Suppose, for
example, thatw¼11,480, which means that
in 57.4% of the 20,000 pairs, the case had a
higher predicted probability than its noncase
pair.
Now letzdenote the number of case/noncase
pairs in which both case and noncase had
exactly the same predicted probability. Con-
tinuing our example, we supposez¼5,420, so
that this result occurred for only 27.1% of the
20,000 pairs.
Then, for our example, the proportion of the
20,000 case-control pairs for which the case
has at least as large a predicted probability as
the control is (wþz)/np, which is 16,900/
20,000, or 0.8450.
A modification of this formula (called “c”)
involves weighting by 0.5 any pair with equal
predicted probabilities; that is, the numerator is
modified to “wþ0.5z”, so thatcbecomes 0.7095.
It is the latter modified formula that is equiva-
lent to the area under the ROC, i.e., AUC.
Based on the grading guidelines for AUC that
we provided in the previous section, the AUC of
0.7095 computed for this hypothetical example
would be considered to provide fair discrimi-
nation (i.e., grade C).
In our presentation of the above AUC formula,
we have not explicitly demonstrated why this
formula actually works to provide the area
under the ROC curve.
Example: (Continued)
Fit logistic model PðXÞ
and
computeP^ðXiÞfori¼1,..., 300
np¼n 1 n 0 ¼ 100 200 ¼20,000
w = no. of case/noncase pairs for which
Pˆ(Xcase)>Pˆ(Xnoncase)
EXAMPLE
Example: Supposew¼11,480
(i.e., 57.4% of 20,000)
Z = no. of case/noncase pairs for which
ˆP(Xcase) =P(ˆXnoncase)
EXAMPLE
Example: Supposez¼5,420.
(i.e., 27.1% of 20,000)
pd¼
wþz
np
¼
11,480þ5,420
20,000
¼ 0 : 8450
Modified formula:
20,000
=14,190
=
c =11,48020,000+ 0.5(5,420) =0.7095
c w+n0.5z=AUC
p
Interpretation from guidelines:
AUC¼ 0 : 7095 )Fair discrimination
ðgrade CÞ
How does AUC formula provide
geometrical area under curve?
Ilustrative Example below.
Presentation: IV. Computing the Area Under the ROC (AUC) 359