3.5 Sequential
Feature Selection with
Logistic Regression
As an alternative approach and to probe the robustness of our
conclusions, we will apply a Sequential Backward Selection (SBS)
algorithm combined with logistic regression [32] for the classifica-
tion of active versus non-active compounds. SBS is a model-agnos-
tic feature selection algorithm that evaluates different combinations
of features, shrinking the subset of features to be considered one by
one. Here, model-agnostic refers to the fact that SBS can be com-
bined with any machine learning algorithm for classification or
regression.
In general, sequential feature selection algorithms are greedy
search algorithms that reduce thed-dimensional feature space to a
smallerk-dimensional subspace, wherek<d. The sequential fea-
ture selection approach selects the best-performing feature subsets
automatically and can help optimizing two objectives: improving
the computational efficiency and reducing the generalization error
of a model by getting rid of features that are irrelevant.
The SBS algorithm removes features from the initial feature
subset sequentially until a new, reduced feature subspace contains a
specified number of features. To determine a feature that is to be
removed at each iteration of the SBS algorithm, we need to define a
criterion functionJ, which is to be minimized. For instance, this
criterion function is defined as the difference between the perfor-
mance of the model before and after the feature removal. In other
words, at each iteration of the algorithm, the feature that results in
the least performance loss (or highest performance gain) is elimi-
nated. This removal of features is repeated in each iteration of the
algorithm until the desired, pre-specified size of the feature subset is
reached. More formally, we can express the SBS algorithm in the
following pseudo-code notation adapted from [30]:- Initialize the algorithm withk=d, wheredis the dimensionality
 of the full feature spaceXd.
- Determine the feature x that maximizes the criterion:
 x=argmaxJ(Xdx),wherex∈Xk.
- Remove the featurexfrom the feature set:Xk 1 =Xkx;
 k=k–1.
- Terminate ifkequals the number of desired features; otherwise,
 go tostep 2.
 The reason why we chose sequential feature selection to deduce
 functional group matching patterns that are predictive of active and
 non-active molecules is that it presents an intuitive method that has
 been shown to produce accurate and robust results (seeNote 12).
 For more information on sequential feature selection, please
 read [17].
 Logistic regression is one of the most widely used classification
 algorithms in academia and industry. One of the reasons why
 logistic regression is a popular choice for predictive modeling is
Inferring Activity Discriminants 325