Computational Drug Discovery and Design

Fig. 12Similar to fitting a DecisionTreeClassifier (Fig.11), we first initialize a newRandomForest-
Classifierobject from scikit-learn and fit it to the functional group matching pattern array (X) and labels
of the active and non-active molecules (y_binary). By settingn_estimators¼ 1000 , we will use
1000 decision trees for the forest.n_jobs¼ 1 means that we are utilizing all processors on our machine
to fit those decision trees in parallel to speed up the computation. Therandom_stateparameter accepts
an arbitrary integer for the bootstrap sampling and feature selection in the decision tree to make the
experiment deterministic and reproducible

Fig. 13Relative feature importance of the functional group matches inferred from the random forest model
that was trained to discriminate between active and non-active molecules. First, the importances values are
sorted from highest to lowest using NumPy’sargsortfunction. Next, we summarize the computed feature
importance in a bar plot using matplotlib’spyplotsubmodule, which was imported aspltearlier

324 Sebastian Raschka et al.

Computational Drug Discovery and Design

Get our desktop app

Company

Features

Documentation

Resources