the possible toxicity targets of the toxic compounds were identified
based on protein-ligand-based toxicophores and the structures of
similar reference compounds, which are valuable for studying the
biochemical mechanisms of acute toxicity.
3 Carcinogenicity
Chemical carcinogenicity is a serious threat to human health.
According to the regulatory authorities of European Union,
Japan, and the USA, it is essential to perform the carcinogenicity
studies before the marketing approval of medicines [72]. The con-
ventional test for carcinogenicity is the 2-year rodent carcinogenic-
ity assay, which is highly expensive, labor-intensive, and time-
consuming [73]. The chemical carcinogens can be categorized as
genotoxic and nongenotoxic/epigenetic carcinogens based on the
mechanism of carcinogenesis [74]. Genotoxic carcinogens can
cause damage directly to DNA and usually be detected by various
short-term and less costly mutagenicity assays, such as Ames assay,
gene mutation assay, chromosome aberration assay, DNA damage
assay, and micronucleus assay [75]. However, these methods are
not effective for those nongenotoxic carcinogens on account of
their different and specific mechanisms of carcinogenesis [74].
As the highly material and time cost of bioassays, it is urgent to
develop accurate computational models for predicting carcinoge-
nicity based on the structures and properties of chemicals. Benfe-
nati et al. concluded that the current in silico models for
carcinogenicity were classified into structural alert (SA)-based mod-
els, local models, and global models [76].
3.1 SA-Based
Models for
Carcinogenicity
SAs are defined as the substructures that are considered to cause the
potential toxicity [77]. The traditional SAs were often generated
based on the expert opinion of toxicologists. For example, Ashby
proposed a hypothetical structure with SAs that link to potential
carcinogenicity in 1985 [78]. Thirty-three SAs were proposed by
Bailey et al. [79] based on the Ashby’s SAs and a related list
compiled by Munro et al. [80]. Then, structurally diverse chemicals
drive the development of the methods by using the machine
learning methods to extract SAs. In 2005, a list of 29 SAs was
automatically extracted based on the data mining analysis and
produced a total classification error of 18% for 4337 chemicals
[81]. In 2006, Kazius et al. adopted an elaborate chemical repre-
sentation method (called hierarchical graphs) and a substructure
mining method (called Gaston) and extracted six discriminative and
nonredundant substructures with overall classification error of 21%
[82]. Benigni and Bossa combined the previous work mentioned
above and extracted a new list of SAs using Toxtree 1.50 [83]. Tox-
tree 1.50 showed higher accuracy (70%) for the same data set
Machine Learning-Based Modeling of Drug Toxicity 253