changes in aromatic ring substituents could reduce the potency for
hERG binding [105]. This conclusion is consistent with the obser-
vations of Braga et al. based on 4980 compounds, which indicated
that removing carbons, changing the electronic environment
around the basic nitrogen, and adding a hydroxyl group could
reduce the potency of a compound inhibiting hERG [106].
A number of STR models have been developed by multiple
machine learning methods, such askNN, ANN, SVM, and RF, for
the prediction of hERG blockage [98]. The first STR model was
published by Roche et al., in which three classes were set with the
cutoffs IC 50 ¼ 1 μM and IC 50 ¼ 10 μM[107]. The PLS, self-
organizing maps, principal component analysis, and supervised
neural networks were adopted to build classification models.
Among them, the model using supervised neural networks showed
the best performance, in which 93% of nonblockers and 71% of
blockers were predicted correctly. Li et al. docked 495 compounds
in a homology model of hERG based on the KvaP template and
calculated pharmacophore-based GRIND descriptors, including
hydrophobic interaction, hydrogen bond acceptor and donor, and
molecular shape descriptors [108]. Then, the descriptors were
applied into a SVM classifier to establish classification models at
thresholds of 1, 5, 10, 20, 30, and 40μM, respectively. The model
was tested on an external set of 66 compounds and a large data set
containing 1948 compounds and achieved the accuracy values of
72% and 73%, respectively. Wang et al. used NB and recursive
partitioning (RP) to establish hERG classification model based on
806 compounds [109]. When the threshold was 1μm, the Bayesian
classifier based on 14 molecular properties and LCFP_8 fingerprint
achieved the highest global accuracy of 91.5% for the training set
and 88.3% for the test set.
5 Conclusions
Nowadays, a variety of in silico models for acute toxicity have been
established with the aim of saving experimental resources in the
early stage of drug development. However, the prediction accuracy
is difficult to achieve a major breakthrough due to lack of suffi-
ciently large data sets. Therefore, most of the previous prediction
models improved the performance by limiting the model coverage.
The future efforts will be devoted to enrich the data set with diverse
structures and broad activity distribution.
Cancer is one of the leading causes of death, and it is necessary
to identify chemical carcinogenicity as early as possible. The effi-
ciency of machine learning models for carcinogenicity depends on
the reliable and sufficient experimental data. In general, in silico
models for nongenotoxic carcinogenicity performed inferior to
those for genotoxic carcinogenicity. Moreover, global models
258 Jing Lu et al.