A.6 Regression analysis
In medicinal chemistry, it is often desirable to obtain mathematical relationships
in the form of equations between sets of data, which have been obtained from
experimental work or calculated using theoretical considerations. Regression
analysis is a group of mathematical methods used to obtain such relationships.
The data is fed into a suitable computer program, which on execution produces
an equation that represents the line that is the best fit for that data. For example,
an investigation indicated that the relationship between the activity and the
partition coefficients of a number of related compounds appeared to be linear
(Figure A6.1). Consequently, this data could be represented mathematically in
the form of the straight line equationy ¼ mxþc. Regression analysis would
calculate the values ofmandcthat gave the line of best fit to the data. When one
is dealing with a linear relationship the analysis is usually carried out using the
method of least squares.
Regression equations do not indicate the accuracy and spread of the data.
Consequently, they are normally accompanied by additional data, which as a
minimum requirement should include the number of observations used (n), the
standard deviation of the observations (s) and the correlation coefficient (r).
The value of the correlation coefficient is a measure of how closely the data
matches the equation. It varies from zero to one. A value ofr¼1 indicates a
perfect match. In medicinal chemistry rvalues greater than 0.9 are usually
regarded as representing an acceptable degree of accuracy, provided they
are obtained using a reasonable number of results with a suitable standard
deviation.
x
xx
x x
x
x
x
x
x
x
x
x
x
x
logP
log 1/C
Figure A6.1 A hypothetical plot of the activity (log1/C) of a series of compounds against the
logarithm of their partition coefficients (logP)
250 APPENDIX 6 REGRESSION ANALYSIS