Bandit Algorithms

(Jeff_L) #1
26.5 Optimization 296

better point

−∇f(x)

−∇f(x∗)

Figure 26.4Illustration of first-order optimality conditions. The point at the top is not a
minimizer because the hyperplane with normal as gradient does not support the convex
set. The point at the right is a minimizer.

where the last step follows from the useful trick thatax−bx^2 ≤a^2 /(4b) for all
x∈Randb≥0.

26.5 Optimization


Thefirst-order optimality conditionstates that ifx∈Rdis the minimizer
of a differentiable functionf:Rd→R, then∇f(x) = 0. One of the things we
like about convex functions is that whenfis convex the first-order optimality
condition is both necessary and sufficient. In particular, if∇f(x) = 0 for some
x∈Rdthenxis a minimizer off. The first-order optimality condition can also
be generalized to constrained minima: iff:Rd→Ris convex, differentiable and
A⊆Rdis a nonempty convex set then

x∗∈argminx∈Af(x)⇔∀x∈A:〈∇f(x∗),x−x∗〉≥ 0. (26.2)

The necessity of this condition is easy to understand by a geometric reasoning. If
∇f(x∗) = 0 then the said condition trivially holds. If∇f(x∗) 6 = 0, the hyperplane
Hx∗whose normal is∇f(x∗) and goes through x∗ must be asupporting
hyperplaneofA atx∗ with−∇f(x∗) being the outer normal ofAatx∗
otherwisex∗could be moved by a small amount while staying insideAand
improving the value off. SinceAis convex, it thus lies entirely on the side of
Hx∗that∇f(x∗) points into. This is clearly equivalent to(26.2). The sufficiency
of the condition also follows from this geometric viewpoint as the reader may
verify from the figure.
The above statement continues to hold with a small modification even whenf
is not differentiable everywhere. In particular, in this case the equivalence(26.2)
holds for anyx∗∈dom(∇f) with the modification that on the right-side of the
equivalence,Ashould be replaced byA∩dom(f):
Free download pdf