10.5 Exercises 138
(a)Implement Algorithm 8 and Algorithm 6 where the latter algorithm should
be tuned for 1/2-subgaussian bandits so that
At= argmaxi∈[k]μˆi(t−1) +
√
log(f(t))
2 Ti(t−1)
.
(b)Letn= 10000 andk= 2. Plot the expected regret of each algorithm as a
function of ∆ whenμ 1 = 1/2 andμ 2 = 1/2 + ∆.
(c) Repeat the above experiment withμ 1 = 1/10 andμ 1 = 9/10.
(d) Discuss your results.