Bandit Algorithms

(Jeff_L) #1
36.7 Exercises 445

by Abeille and Lazaric [2017a] and a spectral version by Koc ́ak et al. [2014]. A
recent paper analyzes the combinatorial semibandit setting [Wang and Chen,
2018]. There is a tutorial on Thompson sampling by Russo et al. [2017] that
focuses mostly on applications and computational issues. We mentioned there are
other ways to configure Algorithm 23, for example the recent article by Kveton
et al. [2018].

36.7 Exercises


36.1(Equivalent views) Prove the claimed equivalences in Note 1.

36.2(Filling in steps in the proof of Theorem 36.1 I.) Consider the
eventEdefined in Theorem 36.1 and prove thatP(Ec)≤nkδ.

36.3(Filling in steps in the proof of Theorem 36.1 II.) Prove Eq. (36.1).

36.4(Removing logarithmic factors) Improve the bound in Theorem 36.1
to show that BRn≤C


knwhereC >0 is a universal constant.

Hint Replace the naive confidence intervals used in the proof of Theorem 36.1
by the more refined confidence bounds used in Chapter 9. The source for this
result is the paper by Bubeck and Liu [2013].
36.5(Filling in steps in the proof of Theorem 36.2) LetGi(s) =
1 −Fis(μ 1 −ε). Show that

(a)


t∈T

I{At=i}≤

∑n

s=1

I{Gi(s−1)> 1 /n}.

(b)E

[



t/∈T

I{Eic(t)}

]


≤E


[



t/∈T

1 /n

]


.


36.6(Frequentist bound for Thompson sampling) In this exercise you
will prove Theorem 36.3.

(a) Show there exists a universal constantc >0 such that

E


[∞



t=1

(


1


G 1 s(ε)

− 1


)]


≤ c
ε^2

log

(


1


ε

)


.


(b) Show that

E


[n

s=1

I{Gis> 1 /n}

]


≤ 2 log(n)
(∆i−ε)^2

+o(log(n)).

(c)Use Theorem 36.2 and the fundamental regret decomposition (Lemma 4.5)
to prove Theorem 36.3.
Free download pdf