Extended Data Table 4 | Potential use of the AI system in two clinical applications
a
Sensitivity (%)
(n = 41 4)
Specificity (%)
(n = 25 ,422)
Simulated reduction
of second reader
workload (%)
AI as second reader (UK) 66.6 6 96.2 6 87.9 8
Existing workflow (UK) 67.3 9 96.2 4 -
95% CI on the difference (-2.68, 1.23) (-0.13, 0.17) -
b
Triage status Dataset
Sensitivity (%)
(95% CI)
Specificity (%)
(95% CI)
Reliability of triage
decision (%)
(95% CI)
Negative
UK
99.6 3
(98.88, 100.0)
n = 27 4
41.1 5
(40.57, 41.72)
n = 25,443
99.99 (NPV)
(99.97, 100.0)
n = 10,471
USA
98.0 5
(96.12, 99.16)
n = 35 9
34.7 9
(31.97, 37.60)
n = 2, 411
99.90 (NPV)
(99.83, 99.96)
n = 72 0
Positive
UK
41.2 4
(35.63, 47.08)
n = 27 4
99.9 2
(99.89, 99.95)
n = 25,443
85.69 (PPV)
(79.66, 90.98)
n = 13 2
USA
29.8 0
(25.21, 34.45)
n = 35 9
99.9 0
(99.78, 99.97)
n = 2, 411
82.41 (PPV)
(65.38, 94.71)
n = 12 1
a, Simulation, using the UK test set, in which the AI system is used in place of the second reader when it concurs with the first reader. In cases of disagreement (12.02%) the consensus opinion
was invoked. The high performance of this combination of human and machine suggests that approximately 88% of the effort of the second reader can be eliminated while maintaining the
standard of care that is produced by double reading. The decision of the AI system was generated using the first reader operating point (i) shown in Fig. 2a. Confidence intervals are Wald
intervals computed with the Obuchowski correction for clustered data. b, Evaluation of the AI system for low-latency triage. Operating points were set to perform with high NPV and PPV for
detecting cancer in 12 months.