190 4. LINEAR MODELS FOR CLASSIFICATION
the weights becomes equivalent to the Fisher solution (Duda and Hart, 1973). In
particular, we shall take the targets for classC 1 to beN/N 1 , whereN 1 is the number
of patterns in classC 1 , andN is the total number of patterns. This target value
approximates the reciprocal of the prior probability for classC 1. For classC 2 ,we
shall take the targets to be−N/N 2 , whereN 2 is the number of patterns in classC 2.
The sum-of-squares error function can be written
E=
1
2
∑N
n=1
(
wTxn+w 0 −tn
) 2
. (4.31)
Setting the derivatives ofEwith respect tow 0 andwto zero, we obtain respectively
∑N
n=1
(
wTxn+w 0 −tn
)
=0 (4.32)
∑N
n=1
(
wTxn+w 0 −tn
)
xn =0. (4.33)
From (4.32), and making use of our choice of target coding scheme for thetn,we
obtain an expression for the bias in the form
w 0 =−wTm (4.34)
where we have used
∑N
n=1
tn=N 1
N
N 1
−N 2
N
N 2
=0 (4.35)
and wheremis the mean of the total data set and is given by
m=
1
N
∑N
n=1
xn=
1
N
(N 1 m 1 +N 2 m 2 ). (4.36)
After some straightforward algebra, and again making use of the choice oftn, the
Exercise 4.6 second equation (4.33) becomes
(
SW+
N 1 N 2
N
SB
)
w=N(m 1 −m 2 ) (4.37)
whereSWis defined by (4.28),SBis defined by (4.27), and we have substituted for
the bias using (4.34). Using (4.27), we note thatSBwis always in the direction of
(m 2 −m 1 ). Thus we can write
w∝S−W^1 (m 2 −m 1 ) (4.38)
where we have ignored irrelevant scale factors. Thus the weight vector coincides
with that found from the Fisher criterion. In addition, we have also found an expres-
sion for the bias valuew 0 given by (4.34). This tells us that a new vectorxshould be
classified as belonging to classC 1 ify(x)=wT(x−m)> 0 and classC 2 otherwise.