Pattern Recognition and Machine Learning

(Jeff_L) #1
5.5. Regularization in Neural Networks 271

The effect of the regularization term is therefore to pull each weight towards the
centre of thejthGaussian, with a force proportional to the posterior probability of
that Gaussian for the given weight. This is precisely the kind of effect that we are
seeking.
Derivatives of the error with respect to the centres of the Gaussians are also
Exercise 5.30 easily computed to give


∂E ̃
∂μj



i

γj(wi)

(μi−wj)
σj^2

(5.142)

which has a simple intuitive interpretation, because it pushesμjtowards an aver-
age of the weight values, weighted by the posterior probabilities that the respective
weight parameters were generated by componentj. Similarly, the derivatives with
Exercise 5.31 respect to the variances are given by


∂E ̃
∂σj



i

γj(wi)

(
1
σj


(wi−μj)^2
σ^3 j

)
(5.143)

which drivesσjtowards the weighted average of the squared deviations of the weights
around the corresponding centreμj, where the weighting coefficients are again given
by the posterior probability that each weight is generated by componentj. Note that
in a practical implementation, new variablesηjdefined by

σ^2 j= exp(ηj) (5.144)

are introduced, and the minimization is performed with respect to theηj. This en-
sures that the parametersσjremain positive. It also has the effect of discouraging
pathological solutions in which one or more of theσjgoes to zero, corresponding
to a Gaussian component collapsing onto one of the weight parameter values. Such
solutions are discussed in more detail in the context of Gaussian mixture models in
Section 9.2.1.
For the derivatives with respect to the mixing coefficientsπj, we need to take
account of the constraints

j

πj=1, 0 πi 1 (5.145)

which follow from the interpretation of theπjas prior probabilities. This can be
done by expressing the mixing coefficients in terms of a set of auxiliary variables
{ηj}using thesoftmaxfunction given by

πj=

exp(ηj)
∑M
k=1exp(ηk)

. (5.146)

The derivatives of the regularized error function with respect to the{ηj}then take
Exercise 5.32 the form

Free download pdf