##### 238 5. NEURAL NETWORKS

`where cubic and higher terms have been omitted. Herebis defined to be the gradient`

ofEevaluated atŵ

b≡∇E|w=wb (5.29)

and the Hessian matrixH=∇∇Ehas elements

`(H)ij≡`

##### ∂E

`∂wi∂wj`

`∣`

∣

∣

∣

w=wb

##### . (5.30)

`From (5.28), the corresponding local approximation to the gradient is given by`

`∇Eb+H(w−ŵ). (5.31)`

`For pointswthat are sufficiently close toŵ, these expressions will give reasonable`

approximations for the error and its gradient.

Consider the particular case of a local quadratic approximation around a point

wthat is a minimum of the error function. In this case there is no linear term,

because∇E=0atw, and (5.28) becomes

`E(w)=E(w)+`

##### 1

##### 2

`(w−w)TH(w−w) (5.32)`

`where the HessianHis evaluated atw. In order to interpret this geometrically,`

consider the eigenvalue equation for the Hessian matrix

`Hui=λiui (5.33)`

`where the eigenvectorsuiform a complete orthonormal set (Appendix C) so that`

`uTiuj=δij. (5.34)`

`We now expand(w−w)as a linear combination of the eigenvectors in the form`

`w−w=`

`∑`

`i`

`αiui. (5.35)`

`This can be regarded as a transformation of the coordinate system in which the origin`

is translated to the pointw, and the axes are rotated to align with the eigenvectors

(through the orthogonal matrix whose columns are theui), and is discussed in more

detail in Appendix C. Substituting (5.35) into (5.32), and using (5.33) and (5.34),

allows the error function to be written in the form

`E(w)=E(w)+`

##### 1

##### 2

`∑`

`i`

`λiα^2 i. (5.36)`

`A matrixHis said to bepositive definiteif, and only if,`

`vTHv> 0 for allv. (5.37)`