Computational Drug Discovery and Design

(backadmin) #1

2.3 Graph
Visualization Software


To visualize the decision trees later in this chapter, an installation of
GraphViz is needed. The GraphViz package is freely available at
http://www.graphviz.org with the installation and setup
instructions.

2.4 Dataset The datasets used in this chapter, as well as the source files of all the
accompanying code, are available online under a permissive open
source license athttps://github.com/psa-lab/predicting-activity-
by-machine-learning.


2.5 Additional
Resources


If you are unfamiliar with Python and the Python libraries that you
installed in section2.2, it is highly recommended to familiarize
yourself with their basic functionality by reading these freely avail-
able resources:
l Python Beginner Guide: https://wiki.python.org/moin/
BeginnersGuide
l NumPy Quickstart Tutorial: https://docs.scipy.org/doc/
numpy-dev/user/quickstart.html
l Introduction to NumPy: https://sebastianraschka.com/pdf/
books/dlb/appendix_f_numpy-intro.pdf
l 10 Minutes to Pandas: http://pandas.pydata.org/pandas-docs/
stable/10min.html
l Matplotlib Tutorials: https://matplotlib.org/users/index.html
l An Introduction to Machine Learning Using Scikit-learn:
http://scikit-learn.org/stable/tutorial/basic/tutorial.html

3 Methods


This section walks through the individual steps involved in a typical
analysis pipeline for identifying which functional groups and atoms
(or other molecular properties orfeatures) are predictive of the
measured biological activity of the molecules.

3.1 Loading and
Inspecting the
Biological Activity
Dataset


This section explains how to load a CSV-formatted dataset table
(e.g., the DKPES dataset) into a current Python session. A conve-
nient way to parse a dataset from a tabular plaintext format, such as
CSV, is to use theread_csvfunction from the Pandas library as
shown in the code example in Fig.6, which loads the DKPES
dataset into a Pandas DataFrame object (df) for further processing
(seeNote 1).
As a result from executing the code shown in Fig.6, the
df.head(10)call will display the first ten rows in the dataset, to
confirm that the data file has been parsed correctly. The DKPES
dataset consists of 56 rows, where each row stores the functional

Inferring Activity Discriminants 315
Free download pdf