Bandit Algorithms

(Jeff_L) #1
10.5 Exercises 138

(a)Implement Algorithm 8 and Algorithm 6 where the latter algorithm should
be tuned for 1/2-subgaussian bandits so that


At= argmaxi∈[k]μˆi(t−1) +


log(f(t))
2 Ti(t−1)

.


(b)Letn= 10000 andk= 2. Plot the expected regret of each algorithm as a
function of ∆ whenμ 1 = 1/2 andμ 2 = 1/2 + ∆.
(c) Repeat the above experiment withμ 1 = 1/10 andμ 1 = 9/10.
(d) Discuss your results.

Free download pdf