Pattern Recognition and Machine Learning

(Jeff_L) #1
10.5. Local Variational Methods 495

Now, instead of fixingλand varyingx, we can consider a particularxand then
adjustλuntil the tangent plane is tangent at that particularx. Because theyvalue
of the tangent line at a particularxis maximized when that value coincides with its
contact point, we have
f(x) = max
λ

{λx−g(λ)}. (10.130)

We see that the functionsf(x)andg(λ)play a dual role, and are related through
(10.129) and (10.130).
Let us apply these duality relations to our simple examplef(x)=exp(−x).
From (10.129) we see that the maximizing value ofxis given byξ=−ln(−λ), and
back-substituting we obtain the conjugate functiong(λ)in the form

g(λ)=λ−λln(−λ) (10.131)

as obtained previously. The functionλξ−g(λ)is shown, forξ=1in the right-hand
plot in Figure 10.10. As a check, we can substitute (10.131) into (10.130), which
gives the maximizing value ofλ=−exp(−x), and back-substituting then recovers
the original functionf(x)=exp(−x).
For concave functions, we can follow a similar argument to obtain upper bounds,
in whichmax’ is replaced with ‘min’, so that

f(x) = min
λ

{λx−g(λ)} (10.132)

g(λ) = min
x
{λx−f(x)}. (10.133)

If the function of interest is not convex (or concave), then we cannot directly
apply the method above to obtain a bound. However, we can first seek invertible
transformations either of the function or of its argument which change it into a con-
vex form. We then calculate the conjugate function and then transform back to the
original variables.
An important example, which arises frequently in pattern recognition, is the
logistic sigmoid function defined by

σ(x)=

1

1+e−x

. (10.134)

As it stands this function is neither convex nor concave. However, if we take the
logarithm we obtain a function which is concave, as is easily verified by finding the
Exercise 10.30 second derivative. From (10.133) the corresponding conjugate function then takes
the form


g(λ)=min
x

{λx−f(x)}=−λlnλ−(1−λ)ln(1−λ) (10.135)

which we recognize as the binary entropy function for a variable whose probability
Appendix B of having the value 1 isλ. Using (10.132), we then obtain an upper bound on the log
sigmoid
lnσ(x)λx−g(λ) (10.136)

Free download pdf