Computational Drug Discovery and Design

(backadmin) #1
states [76, 77](seeNote 8). The HREX approach has low compu-
tational overhead and can result in enhanced sampling, improved
phase space overlap and faster convergence, while in the worst case
scenario is no different from running the calculations indepen-
dently [76, 78–80].

4.6 Analyzing
the Data


The data collected from all simulations is finally analyzed in order to
obtain an estimate of the binding free energy as the sum of the
smaller free energy differences along the path. In particular, we
need to estimate the two free energy differences for the decoupling
of the ligand from solution (ΔGsolvcoulþvdw) and from the protein–li-
gand complex (ΔGprotelecþvdwþΔGprotrestr); these contributions can then
be added along with theΔGsolvrestrthat was obtained analytically to
recover the final binding free energy (seeNote 9). Each separate
free energy estimate will have its associated uncertainty, which will
need to be propagated into the finalΔGb. Simulation packages that
support alchemical free energy calculations also provide tools for
their analysis, using one or more of the estimators previously dis-
cussed. Alternatively, thealchemical-analysistool (https://github.
com/MobleyLab/alchemical-analysis) is a Python program that
implements the automated analysis of free energy calculations per-
formed with Gromacs [26], Amber [31], Sire (http://sire.org), and
Desmond [81], and allows easy access to a number of estimators,
including MBAR, and the best practices mentioned below (see
Note 10)[24, 30]. An example plot for TI obtained with this
tool is shown in Fig. 4.
When analyzing the data obtained from the simulations, it is
first important to make sure the samples are not correlated. In
practice,∂U/∂λandΔUijvalues are typically printed to file fre-
quently, and their values are likely correlated. One could set a
particularly low output frequency when setting up the calculations,
but this is likely to result in the loss of potentially useful information
from the simulations. What is often done is then to calculate the
autocorrelation timeτof the time series, and then subsample the
data by picking a sample every 1 + 2τ[82]. Once uncorrelated∂U/
∂λorΔUijvalues have been obtained, they can be fed into different
estimators, such as TI and MBAR. Since TI and perturbation
approaches use different information for the free energy estimation
and have different limitations, comparing the results obtained with
the two can be a simple way to check for potential analysis or
sampling issues with the calculations.
It is common to exclude from the analysis an initial portion of
the simulations, as it is expected to contain nonequilibrated sam-
ples. The exact determination of the nonequilibrated region of the
simulations is however not trivial. In fact, if extensive equilibration
is performed prior the production runs, some system may reach
equilibrium before the data needed for the analysis even starts

218 Matteo Aldeghi et al.

Free download pdf