Bandit Algorithms

27.3 Continuous exponential weights 305

play the biggest role in the feasibility of sampling from a measure are(a)what is
the form of the measure or its density and(b)how is the convex set represented.
As it happens, the measure defined in the last display islog-concave, which
means that the logarithm of the density, with respect to the Lebesgue-measure
onA, is a concave function.

Theorem27.2.Letp(a)∝IA(a)exp(−f(a))be a density with respect to the
Lebesgue measure onAsuch thatf:A →Ris a convex function. Then there
exists a polynomial-time algorithm for sampling frompprovided one can compute
the following efficiently:

1 (first-order information): ∇f(a)wherea∈A.
2 (Euclidean projections): argminx∈A‖x−y‖ 2 wherey∈Rd.
The probability distribution defined by Eq. (27.4) satisfies the first condition.
Efficiently computing a projection onto a convex set is a more delicate issue. A
general criteria that makes this efficient is access to aseparation oracle, which
is a computational procedureφthat accepts a pointx∈Rdas input and responds
φ(x) =trueifx∈Aand otherwiseφ(x) =uwhere〈y,u〉>〈x,u〉for ally∈A
(see Fig. 27.1).

A x

Figure 27.1Separation oracle returns the normal of a hyperplane that separatesxfrom Awheneverx /∈A. Whenx∈A, the separation oracle returnstrue.

Theorem∫ 27.3. Assume thatAis compact, convex and has volumevol(A) =

Ada >^0. Then an appropriately tuned instantiation of the continuous
exponential weights algorithm with Kiefer-Wolfowitz exploration has regret bounded
by

Rn≤ 2

√

3 dn(1 +dlog(2n)).
The proof of Theorem 27.3 relies on the following upper bound on the
logarithmic Laplace transform of the uniform measure on a convex setK, which
we leave as an exercise (Exercise 27.11).

Proposition27.4. LetK ⊂Rdbe a compact convex set withvol(K)> 0 and u∈Rdandx∗= argminx∈K〈x,u〉. Then

log

(

∫ vol(K) Kexp (−〈x−x∗,u〉)dx

)

≤1 + max

(

0 , dlog

(

sup x,y∈K

〈x−y,u〉)

))

.

Bandit Algorithms

√

(

)

(

(

))

.

Get our desktop app

Company

Features

Documentation

Resources