176 CHAPTER 5. NEURAL NETWORKS FOR CONTROL
Figure 5.5. Network with weightswijThe goal is to minimizeEwith respect to the weightswijof the network.
Letwijdenote the weight from the nodejto the output neuroni.Tousethe
gradient descent methodfor optimization, we need thewijís to be differentiable.
This boils down to requiring
oqi=fi
Xnj=0wijxqj
to be differentiable. That is, the activation functionfiof theithneuron should
be chosen to be a differentiable function. Note that step functions such as
f(x)=Ω
1 ifx≥ 0
0 ifx< 0are not differentiable.
The sigmoid activation function, shown in Figure 5.6, is differentiable. Note
00.20.40.60.8-10 -8 -6 -4 -2 (^24) x 6 8 10
Figure 5.6.f(x)=1+^1 e−x