models can provide explicit equations to explain which properties
make positive or negative contributions to the toxicity. Toropov
et al. developed a two-variable model using MLR coupled with the
genetic algorithm (GA) [23] for 28 benzene derivatives [24]. The
equation is shown in Eq. (1). X5Av characterizes the presence of
heteroatoms, double and triple bonds in the compounds [25],
while BELe1 represents the information associated with electrone-
gativities, distances, and atom types [26, 27]. The negative con-
tributions of these two variables indicated that the nitro groups had
a greater impact on acute toxicity compared with the halogen
atoms.
log^1 =LD 50
hi
¼ 119 : 203 X5Av 14 : 999 BELe1
þ 33 : 223 ð 1 Þ
The prediction models derived from congeneric compounds
often cover a limited chemical space and therefore have limited
applicability domain. Nowadays, a large number of compounds
have been reported for their toxicological data, which involve
multiple structural types and biochemical mechanisms. The
increase of structural diversity and number of compounds in the
data set makes it difficult to use linear methods for characterizing
the structure-toxicity relationship. In contrast, nonlinear models,
such as neutral network [28, 29] and support vector machine
(SVM) [30, 31], tended to yield better performance than linear
methods for such complex data sets [32–35]. SVM maps the fea-
tures into a high-dimensional space to solve a linear function based
on optimization theory, in which the calculations are simplified by
introducing the kernel function [30]. Wang et al. developed a
QSTR model based on a chemically diverse data set of 571 com-
pounds for predicting acute toxicity to the fathead minnow
[36]. The authors employed the GA to simultaneously select a
descriptor subset and optimize the SVM parameters. Eight descrip-
tors associated with acute toxicity, such as ALogP,ƐHOMO,ƐLUMO,
Table 1
Toxicity categories for acute toxicity defined by the US EPA
Acute toxicity
Category I
(danger/poison)
Category II
(warning)
Category III
(caution)
Category IV
(none required)
Oral (mg/kg) 50 >50 and 500 >500 and 5000 > 5000
Dermal (mg/kg) 200 >200 and 2000 >2000 and 5000 > 5000
Inhalationa(mg/l) 0.05 >0.05 and0.5 >0.5 and 2 > 2
a4 h exposure
Machine Learning-Based Modeling of Drug Toxicity 249