Pattern Recognition and Machine Learning

(Jeff_L) #1
208 4. LINEAR MODELS FOR CLASSIFICATION

where we have made use of (4.88). Also, we have introduced theN×Ndiagonal
matrixRwith elements
Rnn=yn(1−yn). (4.98)
We see that the Hessian is no longer constant but depends onwthrough the weight-
ing matrixR, corresponding to the fact that the error function is no longer quadratic.
Using the property 0 <yn< 1 , which follows from the form of the logistic sigmoid
function, we see thatuTHu> 0 for an arbitrary vectoru, and so the Hessian matrix
His positive definite. It follows that the error function is a concave function ofw
Exercise 4.15 and hence has a unique minimum.
The Newton-Raphson update formula for the logistic regression model then be-
comes


w(new) = w(old)−(ΦTRΦ)−^1 ΦT(y−t)
=(ΦTRΦ)−^1

{
ΦTRΦw(old)−ΦT(y−t)

}

=(ΦTRΦ)−^1 ΦTRz (4.99)

wherezis anN-dimensional vector with elements

z=Φw(old)−R−^1 (y−t). (4.100)

We see that the update formula (4.99) takes the form of a set of normal equations for a
weighted least-squares problem. Because the weighing matrixRis not constant but
depends on the parameter vectorw, we must apply the normal equations iteratively,
each time using the new weight vectorwto compute a revised weighing matrix
R. For this reason, the algorithm is known asiterative reweighted least squares,or
IRLS(Rubin, 1983). As in the weighted least-squares problem, the elements of the
diagonal weighting matrixRcan be interpreted as variances because the mean and
variance oftin the logistic regression model are given by

E[t]=σ(x)=y (4.101)
var[t]=E[t^2 ]−E[t]^2 =σ(x)−σ(x)^2 =y(1−y) (4.102)

where we have used the propertyt^2 =tfort∈{ 0 , 1 }. In fact, we can interpret IRLS
as the solution to a linearized problem in the space of the variablea=wTφ. The
quantityzn, which corresponds to thenthelement ofz, can then be given a simple
interpretation as an effective target value in this space obtained by making a local
linear approximation to the logistic sigmoid function around the current operating
pointw(old)

an(w)  an(w(old))+

dan
dyn





w(old)

(tn−yn)

= φTnw(old)−

(yn−tn)
yn(1−yn)

=zn. (4.103)
Free download pdf