A4 Finding the optimum of a function 563
thex-axis. At the pointx 1 , the gradient is along they-axis:
∂f
∂x
=0. (A.13)
It now remains to set the derivative alongyto zero as well. To this end, we move
up they-axis, but unfortunately the gradient in thex-direction will start deviating
from 0 again, so that, when we arrive at the pointx 2 where they-gradient vanishes,
we must move again in thex-direction, whereby they-gradient starts deviating
from zero and so on. Obviously, it may take a large number of steps to reduce both
derivatives to zero – only in the case where the contour lines are circular will the
minimum be found in two steps. The more eccentric the elliptic contours are, the
more steps are needed in order to find the minimum, and the minimum is approached
along a zigzag path.
To see how we can do a better job, we first realise that close to a minimum a
Taylor expansion offhas the form
f(x)=f 0 +^12 x·Hx+···, (A.14)
where we have assumed that the minimum is located at the origin and thatfassumes
the valuef 0 there (for the general case, a similar analysis applies).His theHessian
matrix, defined as
Hij=
∂^2 f(x 0 )
∂xi∂xj
. (A.15)
Note thatHis symmetric if the partial derivatives offare continuous; moreoverHis
a positive definite matrix (positive definite means that all the diagonal elements ofH
are positive), otherwise we would not be dealing with a minimum but with a saddle
point or a maximum. For the moment, we restrict ourselves to quadratic functions
of the form (A.14) – below we shall see how the same method can be applied to
general functions. The method we describe is called theconjugate gradient method.
If we move from a pointxialong a directionhi, we find the pointxi+ 1 by a line
minimisation. For this pointxi+ 1 we have
hi·∇f(xi+ 1 )=0. (A.16)
We want to move in a new directionhi+ 1 along which the gradient alonghiremains
zero; therefore the gradient is allowed to change only in a direction perpendicular
tohi. If we move from the pointxiover a vectorλhi+ 1 , the change in the negative
gradient is given by
g=−λHhi+ 1 (A.17)
and we want this change to be orthogonal to the directionhiof the previous line
minimisation, and this implies that
hi·Hhi+ 1 =0. (A.18)