Bandit Algorithms

10.5 Exercises 138

(a)Implement Algorithm 8 and Algorithm 6 where the latter algorithm should
be tuned for 1/2-subgaussian bandits so that

At= argmaxi∈[k]μˆi(t−1) +

√

log(f(t)) 2 Ti(t−1)

.

(b)Letn= 10000 andk= 2. Plot the expected regret of each algorithm as a
function of ∆ whenμ 1 = 1/2 andμ 2 = 1/2 + ∆.
(c) Repeat the above experiment withμ 1 = 1/10 andμ 1 = 9/10.
(d) Discuss your results.

Bandit Algorithms

√

.

Get our desktop app

Company

Features

Documentation

Resources