Bandit Algorithms

(Jeff_L) #1
33.6 Exercises 393

(f) Join the dots to prove Theorem 33.10.

33.10 LetP be a distribution over the measurable setX,μ:X →[0,1] be
measurable,α,δ∈(0,1) and defineμ∗α=inf{y:P(μ(X)< y)≥ 1 −α}. Show
that ifn≥log(1/δ)/log(1/(1−α)) then forX 1 ,...,Xn∼Pindependent, with
probability 1−δit holds that maxi∈[n]μ(Xi)≥μ∗α.

33.11(Multiple optimal arms and soundness) Throughout this exercise
letk >1.

(a)LetE=ENk(1). Prove that for any sound pair (π,τ) andν∈Ewith|i∗(ν)|> 1
it holds thatPνπ(τ=∞) = 1.
(b) Repeat the previous part withE=EBk.
(c)Describe an unstructured class ofk-armed stochastic banditsEandν∈ E
with|i∗(ν)|>1 and sound pair (π,τ) for whichPνπ(τ=∞) = 0.

Free download pdf