Bandit Algorithms

36.7 Exercises 445

by Abeille and Lazaric [2017a] and a spectral version by Koc ́ak et al. [2014]. A recent paper analyzes the combinatorial semibandit setting [Wang and Chen, 2018]. There is a tutorial on Thompson sampling by Russo et al. [2017] that focuses mostly on applications and computational issues. We mentioned there are other ways to configure Algorithm 23, for example the recent article by Kveton et al. [2018].

36.7 Exercises

36.1(Equivalent views) Prove the claimed equivalences in Note 1.

36.2(Filling in steps in the proof of Theorem 36.1 I.) Consider the eventEdefined in Theorem 36.1 and prove thatP(Ec)≤nkδ.

36.3(Filling in steps in the proof of Theorem 36.1 II.) Prove Eq. (36.1).

36.4(Removing logarithmic factors) Improve the bound in Theorem 36.1 to show that BRn≤C

√

knwhereC >0 is a universal constant.

Hint Replace the naive confidence intervals used in the proof of Theorem 36.1 by the more refined confidence bounds used in Chapter 9. The source for this result is the paper by Bubeck and Liu [2013]. 36.5(Filling in steps in the proof of Theorem 36.2) LetGi(s) = 1 −Fis(μ 1 −ε). Show that

(a)

∑

t∈T

I{At=i}≤

∑n

s=1

I{Gi(s−1)> 1 /n}.

(b)E

[

∑

t/∈T

I{Eic(t)}

]

≤E

[

∑

t/∈T

1 /n

]

.

36.6(Frequentist bound for Thompson sampling) In this exercise you will prove Theorem 36.3.

(a) Show there exists a universal constantc >0 such that

E

[∞

∑

t=1

(

1

G 1 s(ε)

− 1

)]

≤ c ε^2

log

(

1

ε

)

.

(b) Show that

E

[n ∑

s=1

I{Gis> 1 /n}

]

≤ 2 log(n) (∆i−ε)^2

+o(log(n)).

(c)Use Theorem 36.2 and the fundamental regret decomposition (Lemma 4.5) to prove Theorem 36.3.

Bandit Algorithms

36.7 Exercises

√

∑

[

∑

]

≤E

[

∑

]

.

E

[∞

∑

(

1

− 1

)]

(

1

)

.

E

]

Get our desktop app

Company

Features

Documentation

Resources