5.4. THE DELTA RULE 177
that a step activation functionmodels neurons that eitherfire or do notfire,
while continuous activation functions model neurons such as sigmoids that pro-
vide a gradual degree offiring ñ a ìfuzzy propertyî that is more realistic.
The errorEis a function of the variableswij,i=1, 2 ,...,m,j=1, 2 ,...,n.
Recall that thegradient∇EofEat a pointwwith componentswijis the
vector of partial derivatives∂w∂Eij. Like the derivative of a function of one vari-
able, the gradient always points to the uphill direction of the functionE.The
downhill (steepest descent) direction ofEatWis−∇E. Thus, to minimizeE,
we move proportionally to the negative of∇E, leading to the updating of each
weightwjkas
wjk−→wjk+ 4 wjk
where
4 wjk=−η∂E
∂wijandη> 0 is a number called thelearning rate.
We have
∂E
∂wij
=
XN
q=1∂Eq
∂wijand
∂Eq
∂wij=
∂
∂wij√
1
2
Xmi=1(yqi−oqi)^2!
=
°
oqj−yqj¢ ∂
∂wij
fj√n
Xi=0wjixqi!
since
∂
∂wij
(oqi−yiq)=0fori 6 =j
noting that
oqj=fj√n
Xi=0wjixqi!
Thus,
∂Eq
∂wjk
= xqk°
oqj−yqj¢
fj^0√n
Xi=0wjixqi!
= δqj∑xqkwhere
δqj=°
oqj−yqj¢
fj^0√n
Xi=0wjixqi!
for thejthoutput neuron.