Computational Drug Discovery and Design

(backadmin) #1

Fig. 14Performing sequential feature selection using logistic regression to identify features that discriminate
between active and non-active molecules. After importing the Python classes for fitting theLogisticRe-
gressionclassifier within theSequentialFeatureSelector, by settingforward¼False
andfloating¼False, we specify that the sequential feature selector should perform regular backward
selection. Then we use theplot_sfsfunction to visualize the results with matplotplib’spyplot
submodule. The resulting plot in this figure shows the classification accuracy of the logistic regression models
trained on different feature subsets (functional group matching patterns) via sequential backward selection.
The prediction accuracy (0¼worst, 1¼best), where 1 corresponds to 100% accuracy in predicting active
versus non-active compounds across the input set, was then computed via fivefold cross validation. The plot
presents the average prediction accuracy (whether the model can predict held-out active and non-active
molecules given their functional group matching patterns) across the five different test sets. The error margin
(pale blue region above and below the dark blue average points) shows the standard error of the mean for the
fivefold cross validation


328 Sebastian Raschka et al.

Free download pdf