Computational Systems Biology Methods and Protocols.7z

the high variance of the individual models. The consensus model showed better performance than any individual models. Lu et al. developed four kinds of local lazy learning (LLL) models, including local lazy regression (LLR), SA, SR, and GP, for LD 50 prediction in rats [55]. SA, SR, and GP are directly based on the LD 50 values of the query’s neighbors, while LLR relies on the nearest neighbors as well as one selected descriptor used for building a linear model. Therefore, LLR has a higher risk of gen- erating meaningless results compared with other models. For the training set I with 3472 compounds, the GP model achieved the best performance, yieldingR^2 of 0.413 and MAE of 0.550 for the test set (Table2). It is interesting that LLR produced better prediction ability for the query compounds outside the applicability domain. Therefore, it is hardly surprising that the consensus model obtained significantly higherR^2 and lower MAE as compared with those of any individual model, which indicated that different individual models could explain complementary portions of the variance in LD 50 data. Moreover, the training set allows simple and fast upgrades when new data becomes available, and therefore 2271 compounds not in the training set I were added into the training set II. The results listed in Table2 demonstrated that the performance of the individual and consensus models was significantly improved by extending the training set with diverse struc- tures and broad activity distribution.

2.2 Structure-
Toxicity Relationship
(STR) Models for Acute
Toxicity

In addition to multiple QSTR models, some STR models have been developed for the classification of toxic and nontoxic compounds. Xue et al. compared five machine learning methods (SVM, kNN, logistic regression [56], C4.5 decision tree [57], and proba- bilistic neural network [58]) for predictingTetrahymena pyriformis toxicity based on 1129 compounds with known IGC 50 values [59]. The results indicated that the SVM model using 49 selected descriptors showed the best performance, which yielded overall accuracy of 96.8% and the Matthews correlation coefficient of 91.6% for the test set. Li et al. developed multi-classification models for 12,204 compounds with rat LD 50 values based on the US EPA toxicity cate- gories [12]. Five machine learning methods, including SVM, RF,

Table 2
Performance of the GP model and the consensus model on the test set

Model

Using training set I (3472 compounds) Using training set II (5743 compounds)

R^2 MAE R^2 MAE GP 0.413 0.550 0.587 0.436 Consensus model 0.466 0.510 0.619 0.422

Machine Learning-Based Modeling of Drug Toxicity 251

Computational Systems Biology Methods and Protocols.7z

Get our desktop app

Company

Features

Documentation

Resources