Pattern Recognition and Machine Learning

(Jeff_L) #1
190 4. LINEAR MODELS FOR CLASSIFICATION

the weights becomes equivalent to the Fisher solution (Duda and Hart, 1973). In
particular, we shall take the targets for classC 1 to beN/N 1 , whereN 1 is the number
of patterns in classC 1 , andN is the total number of patterns. This target value
approximates the reciprocal of the prior probability for classC 1. For classC 2 ,we
shall take the targets to be−N/N 2 , whereN 2 is the number of patterns in classC 2.
The sum-of-squares error function can be written

E=

1

2

∑N

n=1

(
wTxn+w 0 −tn

) 2

. (4.31)


Setting the derivatives ofEwith respect tow 0 andwto zero, we obtain respectively

∑N

n=1

(
wTxn+w 0 −tn

)
=0 (4.32)

∑N

n=1

(
wTxn+w 0 −tn

)
xn =0. (4.33)

From (4.32), and making use of our choice of target coding scheme for thetn,we
obtain an expression for the bias in the form

w 0 =−wTm (4.34)

where we have used
∑N

n=1

tn=N 1

N

N 1

−N 2

N

N 2

=0 (4.35)

and wheremis the mean of the total data set and is given by

m=

1

N

∑N

n=1

xn=

1

N

(N 1 m 1 +N 2 m 2 ). (4.36)

After some straightforward algebra, and again making use of the choice oftn, the
Exercise 4.6 second equation (4.33) becomes
(
SW+


N 1 N 2

N

SB

)
w=N(m 1 −m 2 ) (4.37)

whereSWis defined by (4.28),SBis defined by (4.27), and we have substituted for
the bias using (4.34). Using (4.27), we note thatSBwis always in the direction of
(m 2 −m 1 ). Thus we can write

w∝S−W^1 (m 2 −m 1 ) (4.38)

where we have ignored irrelevant scale factors. Thus the weight vector coincides
with that found from the Fisher criterion. In addition, we have also found an expres-
sion for the bias valuew 0 given by (4.34). This tells us that a new vectorxshould be
classified as belonging to classC 1 ify(x)=wT(x−m)> 0 and classC 2 otherwise.
Free download pdf