Computational Physics

564 Appendix A

Vectorshiandhi+ 1 satisfying this requirement are calledconjugate. Since we have
guaranteed the gradient alonghito remain zero along the new path, and as the
second minimisation will annihilate the gradient alonghi+ 1 (which is independent
ofhi), at the next point the gradient is zero along two independent directions, and in
the two-dimensional case we have arrived at the minimum off: in two dimensions,
only two steps are needed to arrive at the minimum of a quadratic function. Forn
dimensions, we neednsteps to arrive at the minimum. This is less obvious than it
seems. Remember that the second step (alonghi+ 1 ) was designed such that it does
not spoil the gradient along the directionhi. The third step will be designed such
that it does not spoil the gradient along the directionhi+ 1. But does it leave the
gradient alonghiunchanged as well? For this to be true at all stages of the iteration,
allthe directionshishould be conjugate:

hi·Hhj=0, alli=j. (A.19)

IfHis the unit matrix, this requirement is equivalent to the set{hi}being orthogonal.
For a symmetric and positive Hessian matrix, the set{H^1 /^2 hi}is orthogonal for a
conjugate set{hi}.
To see perhaps more clearly where the magic of the conjugate gradients arises
from, we define new vectorsyi=

√

Hhi. We can always take the square root ofH
as this matrix has positive, real eigenvalues (see also Section 3.3). In terms of these
new vectors we have the trivial problem of minimising:

y·y. (A.20)

Starting at some arbitrary point, we can simply follow a sequence of orthogonal
directions along each of which we minimise this expression. Now it is trivial that
these minimisations do not spoil each other. But the fact that the vectors in the set
yiare othogonal implies
hi·Hhj= 0 (A.21)

fori=j, that is, the directionshiform a conjugate set. Each sequence of con-
jugate search directions therefore corresponds to a sequence of orthogonal search
directionsyi.
How can we construct a sequence of conjugate directions? It turns out that the
following recipe works. We start with the negative gradientg 0 atx 0 and we take
h 0 =g 0. The following algorithm constructs a new set of vectorsxi+ 1 ,gi+ 1 ,hi+ 1
from the previous one (xi,gi,hi):

gi+ 1 =gi−λiHhi; (A.22a) hi+ 1 =gi+ 1 +γihi; (A.22b) xi+ 1 =xi+λihi (A.22c)

Computational Physics

√

Get our desktop app

Company

Features

Documentation

Resources