Computational Drug Discovery and Design

(backadmin) #1

3.6 Conclusion From the decision tree analysis (section3.3), random forest feature
importance estimation (section3.4), and sequential feature selec-
tion results (section3.5), we can conclude that the sulfate groups
(Sulfur,Sulfate-Ester,andSulfur-Oxygen features) are the most
discriminatory features for distinguishing active from non-active
compounds in DKPES-mediated olfactory responses. From the
inspection of heat maps showing the top ten active and ten least
active molecules (section3.2), we also observed that presence of
sulfate tails are a consistent determinant of activity. One compound
consisting only of a sulfate tail (ZINC14591952, Fig.8) resulted in
62% signal inhibition, which supports the hypothesis that sulfate
groups are a key feature of active molecules. Figure16 summarizes
the results from the random forest feature importance estimation
by comparing the importance values to the proportion of func-
tional group matches in active and non-active molecules.
The data in Fig.16 shows that “Sulfur” and “Sulfur Oxygens”
are the most discriminatory features for a random forest to distin-
guish actives from non-actives, and both features also have a high
rate of occurrence in active versus non-active molecules. Features


Fig. 16Proportion of functional group matches across the 12 active and 44 inactive molecules and relative
functional group feature importance (from the random forest analysis, Fig.13) mapped onto the DKPES
reference molecule. DB refers to “double bond”


330 Sebastian Raschka et al.

Free download pdf