Pattern Recognition and Machine Learning

(Jeff_L) #1
4.3. Probabilistic Discriminative Models 207

4.3.3 Iterative reweighted least squares


In the case of the linear regression models discussed in Chapter 3, the maxi-
mum likelihood solution, on the assumption of a Gaussian noise model, leads to a
closed-form solution. This was a consequence of the quadratic dependence of the
log likelihood function on the parameter vectorw. For logistic regression, there
is no longer a closed-form solution, due to the nonlinearity of the logistic sigmoid
function. However, the departure from a quadratic form is not substantial. To be
precise, the error function is concave, as we shall see shortly, and hence has a unique
minimum. Furthermore, the error function can be minimized by an efficient iterative
technique based on theNewton-Raphsoniterative optimization scheme, which uses a
local quadratic approximation to the log likelihood function. The Newton-Raphson
update, for minimizing a functionE(w), takes the form (Fletcher, 1987; Bishop and
Nabney, 2008)
w(new)=w(old)−H−^1 ∇E(w). (4.92)
whereHis the Hessian matrix whose elements comprise the second derivatives of
E(w)with respect to the components ofw.
Let us first of all apply the Newton-Raphson method to the linear regression
model (3.3) with the sum-of-squares error function (3.12). The gradient and Hessian
of this error function are given by

∇E(w)=

∑N

n=1

(wTφn−tn)φn=ΦTΦw−ΦTt (4.93)

H=∇∇E(w)=

∑N

n=1

φnφTn=ΦTΦ (4.94)

Section 3.1.1 whereΦis theN×Mdesign matrix, whosenthrow is given byφTn. The Newton-
Raphson update then takes the form


w(new) = w(old)−(ΦTΦ)−^1

{
ΦTΦw(old)−ΦTt

}

=(ΦTΦ)−^1 ΦTt (4.95)

which we recognize as the standard least-squares solution. Note that the error func-
tion in this case is quadratic and hence the Newton-Raphson formula gives the exact
solution in one step.
Now let us apply the Newton-Raphson update to the cross-entropy error function
(4.90) for the logistic regression model. From (4.91) we see that the gradient and
Hessian of this error function are given by

∇E(w)=

∑N

n=1

(yn−tn)φn=ΦT(y−t) (4.96)

H = ∇∇E(w)=

∑N

n=1

yn(1−yn)φnφTn=ΦTRΦ (4.97)
Free download pdf