6.3 EXTENDING LINEAR MODELS 231
So far so good. But all this assumes that there is no hidden layer. With a
hidden layer, things get a little trickier. Suppose f(xi) is the output of the ith
hidden unit,wijis the weight of the connection from input j to the ith hidden
unit, and wiis the weight of the ith hidden unit to the output unit. The situa-
tion is depicted in Figure 6.13. As before,f(x) is the output of the single unit in
the output layer. The update rule for the weights wiis essentially the same as
above, except that aiis replaced by the output of the ith hidden unit:
However, to update the weights wijthe corresponding derivatives must be cal-
culated. Applying the chain rule gives
The first two factors are the same as in the previous equation. To compute the
third factor, differentiate further. Because
dE
dwdE
dxdx
dwyfxfxdx
ij ij dwij==-( ( )) ¢( ).dE
dwyfxfxfx
i=-( ( )) ¢( ) ( i).hidden
unit 0input a 0 input a 1 input akw^0
f(x 1 )hidden
unit 1hidden
unit loutput
unitw1f(x 2 )wlf(xl)w^00 w^10
wl0w
01w(^11) wl1
w
w lk
w0k 1k
f(x)
Figure 6.13Multilayer perceptron with a hidden layer.