Bandit Algorithms

30.7 Exercises 351

30.7 Exercises

30.1(Mirror descent for combinatorial bandits) Prove Theorem 30.1.

30.2(Expected supremum norm of Laplace) LetZbe sampled from measure onRdwith densityf(z) = 2−dexp(−‖z‖ 1 ). The purpose of this exercise is to show that

E[‖Z‖∞] =

∑d

i=1

1

i

. (30.13)

(a)LetX 1 ,...,Xdbe independent standard exponentials. Show that‖Z‖∞and max{X 1 ,...,Xd}have the same law. (b) LetMj= maxi≤jXi. Prove forj≥2 that E[Mj] =E[Mj− 1 ] +E[exp(−Mj− 1 )].

(c) Prove by induction or otherwise that for alla,j∈{ 1 , 2 ,...},

E[exp(−aMj)] =

a! ∏a b=1(j+b)

.

(d) Prove the claim in Eq. (30.13).

30.3(Gradient of expected support function) LetA⊂Rdbe a compact convex set andφ(x) =maxa∈A〈a,x〉its support function. LetQbe a measure onRdthat is absolutely continuous with respect to the Lebesgue measure and letZ∼Q. Show that

∇E[φ(x+Z)] =E[argmaxa∈A〈a,x+Z〉].

30.4(Minimax bound for combinatorial semibandits) Adapt the analysis in Exercise 28.13 to derive an algorithm for combinatorial bandits with semibandit feedback for which the regret isRn≤C

√

mdnfor universal constantC >0.

30.5(Lower bound for combinatorial semibandits) Letm≥1 and that d=kmfor somek >1. Prove that for any algorithm there exists a combinatorial semibandit such thatRn≥cmin{nm,

√

mdn}wherec >0 is a universal constant.

Hint The most obvious choice is to chooseA={a∈ { 0 , 1 }d:‖a‖ 1 =m}, which are sometimes calledm-sets. A lower bound does hold for this action set [Lattimore et al., 2018]. However an easier path is to impose a little additional structure and analyze the multitask bandit setting. 30.6(Follow the perturbed leader form-sets) Use the ideas in Note 5 to prove that FTPL hasRn=O ̃(

√

mnd) regret whenA={a∈{ 0 , 1 }d:‖a‖ 1 =m}.

Hint After proving the off-diagonal elements of the Hessian are negative you will also need to tune the learning rate. We do not know of a source for this result, but the full information case was studied by Cohen and Hazan [2015].

Bandit Algorithms

30.7 Exercises

1

. (30.13)

.

√

√

√

Get our desktop app

Company

Features

Documentation

Resources