Computational Drug Discovery and Design

2.3 Graph
Visualization Software

To visualize the decision trees later in this chapter, an installation of GraphViz is needed. The GraphViz package is freely available at http://www.graphviz.org with the installation and setup instructions.

2.4 Dataset The datasets used in this chapter, as well as the source files of all the
accompanying code, are available online under a permissive open
source license athttps://github.com/psa-lab/predicting-activity-
by-machine-learning.

2.5 Additional
Resources

If you are unfamiliar with Python and the Python libraries that you installed in section2.2, it is highly recommended to familiarize yourself with their basic functionality by reading these freely available resources: l Python Beginner Guide: https://wiki.python.org/moin/ BeginnersGuide l NumPy Quickstart Tutorial: https://docs.scipy.org/doc/ numpy-dev/user/quickstart.html l Introduction to NumPy: https://sebastianraschka.com/pdf/ books/dlb/appendix_f_numpy-intro.pdf l 10 Minutes to Pandas: http://pandas.pydata.org/pandas-docs/ stable/10min.html l Matplotlib Tutorials: https://matplotlib.org/users/index.html l An Introduction to Machine Learning Using Scikit-learn: http://scikit-learn.org/stable/tutorial/basic/tutorial.html

3 Methods

This section walks through the individual steps involved in a typical analysis pipeline for identifying which functional groups and atoms (or other molecular properties orfeatures) are predictive of the measured biological activity of the molecules.

3.1 Loading and
Inspecting the
Biological Activity
Dataset

This section explains how to load a CSV-formatted dataset table (e.g., the DKPES dataset) into a current Python session. A conve- nient way to parse a dataset from a tabular plaintext format, such as CSV, is to use theread_csvfunction from the Pandas library as shown in the code example in Fig.6, which loads the DKPES dataset into a Pandas DataFrame object (df) for further processing (seeNote 1). As a result from executing the code shown in Fig.6, the df.head(10)call will display the first ten rows in the dataset, to confirm that the data file has been parsed correctly. The DKPES dataset consists of 56 rows, where each row stores the functional

Inferring Activity Discriminants 315

Computational Drug Discovery and Design

Get our desktop app

Company

Features

Documentation

Resources