Bandit Algorithms

(Jeff_L) #1
27.6 Exercises 308

27.5 (Changing action sets) Provide the necessary corrections to
Algorithm 15 and its analysis to prove the result claimed in Note 4.


Hint You will need to choose a new exploration distribution in every round.
Otherwise everything is more-or-less the same.
27.6(Covering numbers for convex sets) ForK ⊂ Rdlet ‖x‖K =
supy∈K|〈x,y〉|. LetA⊂RdandL={y:‖y‖A≤ 1 }. LetN(A,ε) be the size of
the smallest subsetC ⊆Asuch thatminx′∈C‖x−x′‖L≤εfor allx∈A. Show
the following:

(a) WhenA=


{


x∈Rd:‖x‖V− 1 ≤ 1

}


we haveN(A,ε)≤(3/ε)d.
(b) WhenAis convex, bounded and span(A) =Rdwe haveN(A,ε)≤(3d/ε)d.
(c) For any boundedA⊂Rdwe haveN(A,ε)≤(6d/ε)d.


Hint For the first part find a linear map fromAto the euclidean ball and use
the fact that the euclidean ball can be covered with a set of size (3/ε)d. For the
second part use the fact that for any symmetric, convex and compact setKthere
exists an ellipsoidE={x:‖x‖V≤ 1 }such thatE ⊆K⊆dE.
27.7(Low rank action sets) In the definition of the algorithm and the proof
of Theorem 27.1 we assumed thatAspansRdand that it has positive Lebesgue
measure. Show that this assumption may be relaxed by carefully adapting the
algorithm and analysis.

27.8(Necessity of exploration) We saw in Chapter 11 that the exponential
weights algorithm achieved near-optimal regret without mixing additional
exploration. Show that exploration is crucial here. More precisely, construct a
finite action setAand reward sequenceyt∈Lsuch that the regret of Algorithm 15
withγ= 0 leads to a very bad algorithm relative to the optimal choice.


27.9(Continuous exponential weights) Complete the missing steps in the
proof of Theorem 27.3.

27.10(Low rank action sets (ii)) In the definition of the algorithm and
the proof of Theorem 27.3 we assumed thatAspansRdand that it has positive
Lebesgue measure. Show that this assumption may be relaxed by carefully
adapting the algorithm and analysis.

27.11(Volume bounds) Prove Proposition 27.4.
Free download pdf