6.3 EXTENDING LINEAR MODELS 231
So far so good. But all this assumes that there is no hidden layer. With a
hidden layer, things get a little trickier. Suppose f(xi) is the output of the ith
hidden unit,wijis the weight of the connection from input j to the ith hidden
unit, and wiis the weight of the ith hidden unit to the output unit. The situa-
tion is depicted in Figure 6.13. As before,f(x) is the output of the single unit in
the output layer. The update rule for the weights wiis essentially the same as
above, except that aiis replaced by the output of the ith hidden unit:
However, to update the weights wijthe corresponding derivatives must be cal-
culated. Applying the chain rule gives
The first two factors are the same as in the previous equation. To compute the
third factor, differentiate further. Because
dE
dw
dE
dx
dx
dw
yfxfx
dx
ij ij dwij
==-( ( )) ¢( ).
dE
dw
yfxfxfx
i
=-( ( )) ¢( ) ( i).
hidden
unit 0
input a 0 input a 1 input ak
w^0
f(x 1 )
hidden
unit 1
hidden
unit l
output
unit
w
1
f(x 2 )
wl
f(xl)
w^00 w^10
wl0
w
01
w
(^11) wl1
w
w lk
w0k 1k
f(x)
Figure 6.13Multilayer perceptron with a hidden layer.