Computational Drug Discovery and Design

(backadmin) #1
conformation dependency would be to determine the bound
(also called bioactive) conformation and the conformational
energy. Defining the bound conformation is a difficult and
time-consuming task. Bound conformations of ligands can be
obtained experimentally by NMR or X-ray crystallography, but
the number of ligands which have not been cocrystallized with
the target protein greatly exceeds the number that have. Crys-
tal structures also have limitations, from data acquisition and
data refinement errors to the potential inadequacy of crystal
structures to represent the conformational ensemble in solu-
tion. Precious clues on the active conformation can be obtained
when rigid ligands with restricted conformational freedom are
available. When no clues on the bioactive conformation can be
inferred from experimental data or rigid ligands, the modeler
has no alternative but to sample the potential energy surface of
the ligands; a number of methods (all of them computationally
demanding) are available for such purpose, including system-
atic search, stochastic approaches, and molecular dynamics.
Very frequently very rough approximations are performed in
this stage, from using the presumed global energy minimum or
a local energy minimum (which is not representative of the
bioactive conformation) to energy minimization procedures
in vacuum that neglect solvent effects. Note that the strain
energy is characteristically below 10 kcal/mol, but there are
exceptions.
In most applications, a subset of descriptors will be chosen
from a relatively large pool of descriptors. There is a diversity of
methods to proceed with descriptor selection (genetic algo-
rithms, stepwise approaches, replacement method, and many
others). What number of descriptors should be allowed into
the model? In our experience, at least 10 training compounds
per independent variable is a good choice to control the gener-
alization error, avoiding overfitting. Some authors propose
that, for noisy data, an optimal trade-off between approxima-
tion and estimation errors is achieved if the number of para-
meters in the model is around the cubic root of the number of
training examples (this is the most conservative approach that
we have so far heard). In any case, overfitting can be retrospec-
tively controlled with adequate validation protocols.


  1. With the sole exception of similarity searches, which are not
    subjected to in silico validation (direct experimental validation
    of the predictions is performed) all the other described
    approaches (structure-based approximations, machine learning
    and pharmacophores) should be validated in silico, although
    how the validation process is executed depends on each tech-
    nique. In the case of docking, for instance, the most frequent
    validation criteria include a method’s ability to reproduce the


Computer-Aided Drug Design 15
Free download pdf