Pattern Recognition and Machine Learning

190 4. LINEAR MODELS FOR CLASSIFICATION

the weights becomes equivalent to the Fisher solution (Duda and Hart, 1973). In particular, we shall take the targets for classC 1 to beN/N 1 , whereN 1 is the number of patterns in classC 1 , andN is the total number of patterns. This target value approximates the reciprocal of the prior probability for classC 1. For classC 2 ,we shall take the targets to be−N/N 2 , whereN 2 is the number of patterns in classC 2. The sum-of-squares error function can be written

E=

1

2

∑N

n=1

( wTxn+w 0 −tn

) 2

. (4.31)

Setting the derivatives ofEwith respect tow 0 andwto zero, we obtain respectively

∑N

n=1

( wTxn+w 0 −tn

) =0 (4.32)

∑N

n=1

( wTxn+w 0 −tn

) xn =0. (4.33)

From (4.32), and making use of our choice of target coding scheme for thetn,we obtain an expression for the bias in the form

w 0 =−wTm (4.34)

where we have used ∑N

n=1

tn=N 1

N

N 1

−N 2

N

N 2

=0 (4.35)

and wheremis the mean of the total data set and is given by

m=

1

N

∑N

n=1

xn=

1

N

(N 1 m 1 +N 2 m 2 ). (4.36)

After some straightforward algebra, and again making use of the choice oftn, the
Exercise 4.6 second equation (4.33) becomes
(
SW+

N 1 N 2

N

SB

) w=N(m 1 −m 2 ) (4.37)

whereSWis defined by (4.28),SBis defined by (4.27), and we have substituted for the bias using (4.34). Using (4.27), we note thatSBwis always in the direction of (m 2 −m 1 ). Thus we can write

w∝S−W^1 (m 2 −m 1 ) (4.38)

where we have ignored irrelevant scale factors. Thus the weight vector coincides with that found from the Fisher criterion. In addition, we have also found an expression for the bias valuew 0 given by (4.34). This tells us that a new vectorxshould be classified as belonging to classC 1 ify(x)=wT(x−m)> 0 and classC 2 otherwise.

Pattern Recognition and Machine Learning

190 4. LINEAR MODELS FOR CLASSIFICATION

E=

1

2

N

N 1

−N 2

N

N 2

=0 (4.35)

1

N

1

N

N 1 N 2

N

SB

Get our desktop app

Company

Features

Documentation

Resources