Pattern Recognition and Machine Learning

(Jeff_L) #1
494 10. APPROXIMATE INFERENCE

x

y
f(x)

λx

x

y
f(x)

λx−g(λ)

−g(λ)

Figure 10.11 In the left-hand plot the red curve shows a convex functionf(x), and the blue line represents the
linear functionλx, which is a lower bound onf(x)becausef(x)>λxfor allx. For the given value of slopeλthe
contact point of the tangent line having the same slope is found by minimizing with respect toxthe discrepancy
(shown by the green dashed lines) given byf(x)−λx. This defines the dual functiong(λ), which corresponds
to the (negative of the) intercept of the tangent line having slopeλ.


exp(−x), we therefore obtain the tangent line in the form

y(x)=exp(−ξ)−exp(−ξ)(x−ξ) (10.126)

which is a linear function parameterized byξ. For consistency with subsequent
discussion, let us defineλ=−exp(−ξ)so that

y(x, λ)=λx−λ+λln(−λ). (10.127)

Different values ofλcorrespond to different tangent lines, and because all such lines
are lower bounds on the function, we havef(x)y(x, λ). Thus we can write the
function in the form

f(x)=max
λ

{λx−λ+λln(−λ)}. (10.128)

We have succeeded in approximating the convex functionf(x)by a simpler, lin-
ear functiony(x, λ). The price we have paid is that we have introduced a variational
parameterλ, and to obtain the tightest bound we must optimize with respect toλ.
We can formulate this approach more generally using the framework ofconvex
duality(Rockafellar, 1972; Jordanet al., 1999). Consider the illustration of a convex
functionf(x)shown in the left-hand plot in Figure 10.11. In this example, the
functionλxis a lower bound onf(x)but it is not the best lower bound that can
be achieved by a linear function having slopeλ, because the tightest bound is given
by the tangent line. Let us write the equation of the tangent line, having slopeλas
λx−g(λ)where the (negative) interceptg(λ)clearly depends on the slopeλof the
tangent. To determine the intercept, we note that the line must be moved vertically by
an amount equal to the smallest vertical distance between the line and the function,
as shown in Figure 10.11. Thus

g(λ)=−min
x
{f(x)−λx}

=max
x

{λx−f(x)}. (10.129)
Free download pdf