the high variance of the individual models. The consensus model
showed better performance than any individual models.
Lu et al. developed four kinds of local lazy learning (LLL)
models, including local lazy regression (LLR), SA, SR, and GP,
for LD 50 prediction in rats [55]. SA, SR, and GP are directly based
on the LD 50 values of the query’s neighbors, while LLR relies on
the nearest neighbors as well as one selected descriptor used for
building a linear model. Therefore, LLR has a higher risk of gen-
erating meaningless results compared with other models. For the
training set I with 3472 compounds, the GP model achieved the
best performance, yieldingR^2 of 0.413 and MAE of 0.550 for the
test set (Table2). It is interesting that LLR produced better predic-
tion ability for the query compounds outside the applicability
domain. Therefore, it is hardly surprising that the consensus
model obtained significantly higherR^2 and lower MAE as com-
pared with those of any individual model, which indicated that
different individual models could explain complementary portions
of the variance in LD 50 data. Moreover, the training set allows
simple and fast upgrades when new data becomes available, and
therefore 2271 compounds not in the training set I were added into
the training set II. The results listed in Table2 demonstrated that
the performance of the individual and consensus models was signif-
icantly improved by extending the training set with diverse struc-
tures and broad activity distribution.
2.2 Structure-
Toxicity Relationship
(STR) Models for Acute
Toxicity
In addition to multiple QSTR models, some STR models have been
developed for the classification of toxic and nontoxic compounds.
Xue et al. compared five machine learning methods (SVM,
kNN, logistic regression [56], C4.5 decision tree [57], and proba-
bilistic neural network [58]) for predictingTetrahymena pyriformis
toxicity based on 1129 compounds with known IGC 50 values
[59]. The results indicated that the SVM model using 49 selected
descriptors showed the best performance, which yielded overall
accuracy of 96.8% and the Matthews correlation coefficient of
91.6% for the test set.
Li et al. developed multi-classification models for 12,204 com-
pounds with rat LD 50 values based on the US EPA toxicity cate-
gories [12]. Five machine learning methods, including SVM, RF,
Table 2
Performance of the GP model and the consensus model on the test set
Model
Using training set I (3472 compounds) Using training set II (5743 compounds)
R^2 MAE R^2 MAE
GP 0.413 0.550 0.587 0.436
Consensus model 0.466 0.510 0.619 0.422
Machine Learning-Based Modeling of Drug Toxicity 251