Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

6.3 EXTENDING LINEAR MODELS 231


So far so good. But all this assumes that there is no hidden layer. With a
hidden layer, things get a little trickier. Suppose f(xi) is the output of the ith
hidden unit,wijis the weight of the connection from input j to the ith hidden
unit, and wiis the weight of the ith hidden unit to the output unit. The situa-
tion is depicted in Figure 6.13. As before,f(x) is the output of the single unit in
the output layer. The update rule for the weights wiis essentially the same as
above, except that aiis replaced by the output of the ith hidden unit:


However, to update the weights wijthe corresponding derivatives must be cal-
culated. Applying the chain rule gives


The first two factors are the same as in the previous equation. To compute the
third factor, differentiate further. Because


dE
dw

dE
dx

dx
dw

yfxfx

dx
ij ij dwij

==-( ( )) ¢( ).

dE
dw

yfxfxfx
i

=-( ( )) ¢( ) ( i).

hidden
unit 0

input a 0 input a 1 input ak

w^0
f(x 1 )

hidden
unit 1

hidden
unit l

output
unit

w

1

f(x 2 )

wl

f(xl)

w^00 w^10
wl0

w
01

w

(^11) wl1
w
w lk
w0k 1k
f(x)
Figure 6.13Multilayer perceptron with a hidden layer.

Free download pdf