5.4. THE DELTA RULE 177
that a step activation functionmodels neurons that eitherfire or do notfire,
while continuous activation functions model neurons such as sigmoids that pro-
vide a gradual degree offiring ñ a ìfuzzy propertyî that is more realistic.
The errorEis a function of the variableswij,i=1, 2 ,...,m,j=1, 2 ,...,n.
Recall that thegradient∇EofEat a pointwwith componentswijis the
vector of partial derivatives∂w∂Eij. Like the derivative of a function of one vari-
able, the gradient always points to the uphill direction of the functionE.The
downhill (steepest descent) direction ofEatWis−∇E. Thus, to minimizeE,
we move proportionally to the negative of∇E, leading to the updating of each
weightwjkas
wjk−→wjk+ 4 wjk
where
4 wjk=−η
∂E
∂wij
andη> 0 is a number called thelearning rate.
We have
∂E
∂wij
=
XN
q=1
∂Eq
∂wij
and
∂Eq
∂wij
=
∂
∂wij
√
1
2
Xm
i=1
(yqi−oqi)^2
!
=
°
oqj−yqj
¢ ∂
∂wij
fj
√n
X
i=0
wjixqi
!
since
∂
∂wij
(oqi−yiq)=0fori 6 =j
noting that
oqj=fj
√n
X
i=0
wjixqi
!
Thus,
∂Eq
∂wjk
= xqk
°
oqj−yqj
¢
fj^0
√n
X
i=0
wjixqi
!
= δqj∑xqk
where
δqj=
°
oqj−yqj
¢
fj^0
√n
X
i=0
wjixqi
!
for thejthoutput neuron.