Science - 6 December 2019

SCIENCE sciencemag.org 6 DECEMBER 20 19 • VOL 3 66 ISSUE 6470 1203

input, and their effects can be hard to iden-
tify ex-ante. But the quality of parametric
updates depends largely on the quality of the
associated underlying data. An adaptive sys-
tem with continuously changing parameters
is susceptible to data quality issues that can
arise from, for example, errors of the AI/ML
users or intentional adversarial attacks ( 6 ).
The latter can take many forms. Consider a
hypothetical example ( 6 ): In response to the
opioid crisis, many insurers now use patient-
or provider-level overdose risk-prediction al-
gorithms to deny oxycontin prescriptions. A
physician, certain that a patient is in need of
a prescription, may learn that the patient can
avoid the algorithmic gatekeeper and secure
a prescription by typing in a combination of
codes that will guarantee a low risk for over-
dose score. Such a system incentivizes the
elicitation of low-quality physician data. An
unchecked dynamic algorithm would inap-
propriately adapt to this over time—consid-
ering all outcomes of prescriptions—and be-
gin to falsely categorize low-risk patients as
high risk. In this kind of situation, continu-
ous oversight can provide a necessary check
on adaptive AI/ML systems.
In an attempt to steer between these two
poles, the FDA released a discussion paper in
April 2019 ( 1 ). Until now, the FDA has exclu-
sively approved or cleared medical AI/ML-
based software as a medical device—what
the FDA calls “SaMD,” which is software that
is on its own a medical device and is not
part of a hardware medical device ( 7 )—with
“locked” algorithms ( 1 ). A locked algorithm
is defined by the FDA as “an algorithm that
provides the same result each time the same
input is applied to it and does not change
with use” ( 1 ). Any AI/ML system can satisfy
this definition provided it is fixed in advance.
However, most AI/ML algorithms are
“adaptive,” arguably their key strength. Even
parameters in a simple model like a logistic
regression will gradually evolve as the model
is refit in response to new data. For adap-
tive AI/ML-based SaMD, the FDA proposed
a “total product lifecycle (TPLC) regulatory
approach” that permits continuous improve-
ment of such devices while maintaining
their safety and effectiveness ( 1 ). The FDA’s
TPLC approach is a feature of the Software
Precertification (Pre-Cert) Program that it is
piloting on a small number of companies to
determine its feasibility ( 2 ). One major idea
in the FDA’s April 2019 discussion paper is
that AI/ML-based SaMD could be updated
to a certain extent after marketing authoriza-
tion; when seeking initial premarket review
of an AI/ML-based SaMD, manufacturers
would be given the option to submit a “prede-
termined change control plan,” which would
contain a description of anticipated modifi-
cations and an “Algorithm Change Protocol,”

including the associated methodology being used to implement such changes ( 1 ).

UNDERSTANDING RISKS Before considering adaptive algorithms, it is important to recognize that a “locked” algorithm, as defined, for example, by the FDA, could be more harmful than an “adaptive” one—and vice versa. To begin with, the concept of “locked” is ambiguous. We focus on two definitions that we call “system lock” and “true function lock.” Do we want the AI/ML system to continually use the locked estimate of the function, relating inputs and outputs, that was first approved? This is how the FDA has defined what it means to be “locked,” a concept that we call “system lock.” Merely achieving “system lock” will not guarantee that the system is safe for patients. An alter- native, and perhaps a preferable one, is that the algorithm locks, as closely as possible, to the true function that relates the inputs and output—which is unknown ex-ante in prac- tice and which emerges over time. We call this “true function lock.” For adaptive algorithms, it is especially important for regulators to assess whether the AI/ML system is overall reliable as applied to new data—i.e., whether it approaches “true function lock.” Several AI/ML features are identified below that, when not properly considered, can lead the AI/ML system to use a poor estimate of the true relationship between the inputs and outputs and thereby possibly cause harm to patients (for example, through misdiagnosis). Regulators need to focus their attention on such issues in or- der to manage the risk that AI/ML systems learn and use a wrong input-output relation.

Concept drift Concept drift describes a situation where the true relation between inputs and outputs changes (over time). This may hap- pen because of a changing environment or because the model was misspecified (for example, the estimated function is linear when the actual relationship is qua- dratic, or there are omitted variables, etc.). Consider, for example, an AI/ML system trained to identify skin lesions as benign or malignant ( 8 ). The model presupposes an underlying distribution of these labels (benign versus malignant). However, the datasets that these AI/ML systems rely on typically do not track race or skin color, or may miss or not report certain skin types. Yet the malignancy of skin lesions (the true relation between input and output/ diagnosis) may vary across race and skin type. As a result, the same image can lead to two different probabilistic diagnoses, depending on the underlying skin/race,

an omitted feature. This sort of problem is ubiquitous in medical AI/ML. A regulatory regime requiring a “system lock” is not immune to this problem. Indeed, a “system locked” algorithm can make mat- ters worse by prohibiting the system from learning. Moreover, a regime focused on predetermined change control plans is like- wise vulnerable to risks arising from concept drift. Any predetermined change control plan risks being either uninformative or impractical—depending on the level of detail at which a maker would be expected to describe future modifications. At one extreme, a maker might be required to describe proposed changes in very general terms. This would be uninformative. On the other extreme, they might be required to describe precisely the sorts of changes they anticipate. Such a task is not feasible without having seen all possible future data from all types of patients and conditions—especially for AI/ML algorithms that may have thousands or millions of parameters. Even if (in theory, or possibly with future technologies) this kind of task could be accomplished, it would be extremely difficult and time-consuming— and thus, impractical. Moreover, such a plan could be especially harmful when unantici- pated problems are reported—in which case, the proposed framework could require an- other round of review.

Covariate shift When the input distribution of new data is different from the data that the algorithm was trained or tested for approval on, the result is covariate shift ( 9 ). This can occur in the absence of concept drift, although the two are not mutually exclusive. For example, training data may have come from geographically centralized clinical sites, but the device is to be deployed beyond those regions and populations. When this occurs, “system locking” the algorithm hampers the maker’s ability to address the problem. Fur- ther, describing how the distribution of patients may change is not something a maker may be able to do ex-ante because they usu- ally do not know the distribution of the data that the algorithm will be applied to.

Instability One major concern is treating similar patients similarly. That is, medically insignifi- cant differences among patients should not lead to substantive differences in diagnosis or treatment. Suppose that when an AI/ML system is given a set of inputs, it produces one probabilistic output. For example, the probability that a particular skin lesion is malignant is 87 %. Now suppose that very small changes are made to the set of inputs provided to the underlying algorithm. For ex-

INSIGHTS

Published by AAAS

on December 12, 2019^

http://science.sciencemag.org/

Downloaded from

Science - 6 December 2019

Get our desktop app

Company

Features

Documentation

Resources