Medicinal Chemistry

(Jacob Rumans) #1

These various descriptors may be calculated using various molecular mechanics and
quantum mechanics approaches, as discussed in chapter 1.
The geometric descriptors reflect molecular geometry and are conceptually straight-
forward. Electronic descriptors reflect properties arising from variations in electron dis-
tribution throughout the drug molecule framework. Topological descriptors endeavor to
describe molecular branching and complexity through the notion of molecular connec-
tivity. The concept of molecular connectivity, introduced by Kier and Hall in 1976,
describes compounds in topological terms. Branching, unsaturation, and molecular
shape are all represented in the purely empirical connectivity index^1 χ, which correlates
surprisingly well with a number of physicochemical properties including the partition
coefficients, molar refractivity, or boiling point. These graph theory indices are useful
to differentiate between an n-butyl substituent and a tert-butyl substituent. The physico-
chemical indices reflect the ability of the drug to partition itself into the lipid surroundings
of the receptor microenvironment.
All of these descriptors are calculated for every compound within the training set.
Next, a 2D data array is constructed. Along the vertical axis, all of the training set com-
pounds are listed in descending order of bioactivity. Along the horizontal axis, all of the
descriptors are arranged for every training set compound. This data array is then probed
with statistical calculations to ascertain the minimum number of descriptors that differ-
entiate active compounds from inactive compounds. In order to probe the data array,
several methods are available. Pattern recognition andcluster analysis,two recent
quantitative methods, make use of sophisticated statistics and computer software.
Pattern recognition can be used to deal with a large number of compounds, each char-
acterized by many parameters. First, however, these raw data must be processed by
scaling and normalization—the conversion of diverse units and orders of magnitude
from many sources — so that the chosen parameters become comparable. Feature
selection methods exist for weeding out irrelevant “descriptors” and obtaining those
that are potentially most useful. By using “eigenvector” or “principal component”
analysis algorithms, these multidimensional data are then projected two-dimensionally
onto a plot whose axes are the two principal components or two (transformed and
normalized) parameters that account for most of the variance; these are the two eigen-
vectors with the highest values. Previously unrecognized relational patterns between large
numbers of compounds characterized by multidimensional descriptors will thus emerge
in a new, comprehensible, two-dimensional plot. The projection of unknowns onto this
eigenvector plot will determine their relationship to active and inactive compounds.
Cluster analysis is similar in concept to pattern recognition. It can define the simi-
larity or dissimilarity of observations or can reveal the number of groups formed by a
collection of data. The distance between clusters of data points is defined either by the
distance between the two closest members of two different clusters or by the distances
between the centers of clusters.
Once the data array has been probed and the minimum number of descriptors that dif-
ferentiate activity from inactivity has been ascertained, a prediction algorithm is
deduced. This algorithm attempts to quantify the bioactivity in terms of the relevant
descriptors. The predictive usefulness of this algorithm is then validated by being
applied to the test set compounds. If the prediction algorithm is sufficiently robust, it
can be used to direct the syntheses of optimized compounds.


144 MEDICINAL CHEMISTRY

Free download pdf