Computational Drug Discovery and Design

(backadmin) #1

  1. Throughout section3, we assumed that the data frame of
    activity data was already sorted by signal inhibition in decreas-
    ing order. While sorting the data frame is not essential for
    fitting the machine learning models in the later section, you
    may consider sorting your datasets for the heat map visualiza-
    tion, to show the ten molecules with the highest inhibition
    activity, for example. To sort the data framedf, you can use
    sortvaluesmethod of a given pandas data frame object.
    For example, the following code sorts the molecules stored as a
    data frame df from most active to least active:df¼df.sort
    -
    values(Signal-Inhibition’, ascending ¼ False).
    More information about thissort_valuesmethod can be
    found in the official pandas documentation athttps://pandas.
    pydata.org/pandas-docs/stable/generated/pandas.
    DataFrame.sort_values.html.

  2. While we recommend working with 3D structures because they
    provide spatial relationships between chemical groups, molec-
    ular features can also be derived from 1D string representations
    of molecules or 2D structural representations. For example, the
    presence of certain substructures or atom types, using so-called
    molecular fingerprints, can be computed using the open-source
    toolkit OpenBabel (https://openbabel.org/docs/dev/
    Fingerprints/intro.html).

  3. To convert a 1D or 2D representation of a molecule into a 3D
    structure as input for the spatial functional group matching in
    the DKPES dataset that was done via Screenlamp [10] using
    ROCS overlays (OpenEye Scientific Software, Santa Fe, NM;
    https://www.eyesopen.com/rocs), you may find the following
    tools helpful:
    l The CACTUS online SMILES translator and structure file
    generator (https://cactus.nci.nih.gov/translate/).
    l OMEGA (OpenEye Scientific Software, Santa Fe, NM;
    https://www.eyesopen.com/omega), which creates multi-
    ple favorable 3D conformers of a given structure from 1D,
    2D, or 3D representations [38, 39]. This software is avail-
    able free for academic researchers upon completion of a
    license agreement with OpenEye.

  4. Further, you may find the BioPandas toolkit [40]helpful
    (http://rasbt.github.io/biopandas/), which reads 3D
    structures from the common MOL2 file format into the pan-
    das data frame format. This can be useful if you are working
    with large MOL2 databases that contain thousands or
    millions of structures that you want to filter for certain
    properties prior to generating overlays via ROCS or compute
    the functional group matching patterns via Screenlamp:
    https://github.com/psa-lab/screenlamp.


332 Sebastian Raschka et al.

Free download pdf