Computational Drug Discovery and Design

states [76, 77](seeNote 8). The HREX approach has low computational overhead and can result in enhanced sampling, improved phase space overlap and faster convergence, while in the worst case scenario is no different from running the calculations indepen- dently [76, 78–80].

4.6 Analyzing
the Data

The data collected from all simulations is finally analyzed in order to obtain an estimate of the binding free energy as the sum of the smaller free energy differences along the path. In particular, we need to estimate the two free energy differences for the decoupling of the ligand from solution (ΔGsolvcoulþvdw) and from the protein–ligand complex (ΔGprotelecþvdwþΔGprotrestr); these contributions can then be added along with theΔGsolvrestrthat was obtained analytically to recover the final binding free energy (seeNote 9). Each separate free energy estimate will have its associated uncertainty, which will need to be propagated into the finalΔGb. Simulation packages that support alchemical free energy calculations also provide tools for their analysis, using one or more of the estimators previously dis- cussed. Alternatively, thealchemical-analysistool (https://github. com/MobleyLab/alchemical-analysis) is a Python program that implements the automated analysis of free energy calculations performed with Gromacs [26], Amber [31], Sire (http://sire.org), and Desmond [81], and allows easy access to a number of estimators, including MBAR, and the best practices mentioned below (see Note 10)[24, 30]. An example plot for TI obtained with this tool is shown in Fig. 4. When analyzing the data obtained from the simulations, it is first important to make sure the samples are not correlated. In practice,∂U/∂λandΔUijvalues are typically printed to file fre- quently, and their values are likely correlated. One could set a particularly low output frequency when setting up the calculations, but this is likely to result in the loss of potentially useful information from the simulations. What is often done is then to calculate the autocorrelation timeτof the time series, and then subsample the data by picking a sample every 1 + 2τ[82]. Once uncorrelated∂U/ ∂λorΔUijvalues have been obtained, they can be fed into different estimators, such as TI and MBAR. Since TI and perturbation approaches use different information for the free energy estimation and have different limitations, comparing the results obtained with the two can be a simple way to check for potential analysis or sampling issues with the calculations. It is common to exclude from the analysis an initial portion of the simulations, as it is expected to contain nonequilibrated samples. The exact determination of the nonequilibrated region of the simulations is however not trivial. In fact, if extensive equilibration is performed prior the production runs, some system may reach equilibrium before the data needed for the analysis even starts

218 Matteo Aldeghi et al.

Computational Drug Discovery and Design

Get our desktop app

Company

Features

Documentation

Resources