(within 1.3 A ̊) with the same group in DKPES. This functional
group matching data is stored as a binary variable, where 0 indicates
“no overlay” and 1 indicates “overlay.” In addition, the ROCS
shape and chemistry (“color”) overlay scores were appended to
the dataset. For information on how the overlay scores are com-
puted, the reader is referred tohttps://docs.eyesopen.com/rocs/
shape_theory.htmland Hawkins et al. [4]; other molecular similar-
ity measures could be used instead (seeNotes 3–5).
It is always helpful to perform exploratory analyses when work-
ing with a new dataset. The following code snippet shown in Fig. 7
will generate the histogram of the signal inhibition values shown,
plus the 2D scatter plot comparing the signal inhibition values with
the molecular similarity measured in the overlays. First, the signal
inhibition data from the data frame (df) is assigned to a variabley,
and the functional group columns of interest to a variableX. Next,
the code in Fig.7 demonstrates how to usematplotlibto createFig. 7Code for performing exploratory analysis in Python using the matplotlib library to plot a histogram of the
“Signal Inhibition” data and a scatter plot to inspect the relationship between the signal inhibition and overlay
scores. In the corresponding programming code, the “Signal Inhibition” column is first assigned to a variable
y, and the functional groups of interest are assigned to the variablefgroup_cols, which is then used
to create the matrixXthat stores the functional groups matching patterns of those functional groups of
interest. Next, a figure with two subplots is initialized by callingplt.subplotsfrommatplotlib.
Theplt.histfunction adds the histogram to the first subplot (ax[0]), and theplt.scatter
function draws the scatter plot in the subplot to the right (ax[1]). The resulting plots show the DKPES
inhibitor activity distribution for the 56 compounds that were assayed (left) and the relationship between
activity and overlay similarity from ROCS (right), given as the TanimotoCombo score in the range 0–2, where
2 means that two 3D structures have an identical volume and partial charge distribution
Inferring Activity Discriminants 317