12.5 Exercises 167(c) Show that with probability at least 1−δ,(B)≤ 2
∑nt=1∑ka=1Ptaβta+log(1/δ)
η.
(d) Show that with probability at least 1−kδ,
(C)≤log(1/δ)
η.
(e) Conclude that for anyδ≤ 1 /(k+ 1), with probability at least 1−(k+ 1)δ,Rn≤3 log(1/δ)
η+η∑nt=1∑ka=1PtaZˆ^2 ta+ 5∑nt=1∑ka=1Ptaβta.Hint This is a long and challenging exercise. You may find it helpful to use
the result in Exercise 5.15. The solution is also available.12.3(Implementation) Consider the Bernoulli bandit withk= 5 arms and
n= 10^4 with meansμ 1 = 1/2 andμi= 1/ 2 −∆ fori >1. Plot the regret of Exp3
and Exp3-IX for ∆∈[0, 1 /2]. You should get something similar to the graph in
Fig. 12.1. Does the result surprise you? Repeat the experiments in Exercise 11.5
with Exp3-IX and convince yourself that this algorithm is more robust than
Exp3.
(^10000). 1 0. 2 0. 3 0. 4 0. 5
150
200
250
300
∆
Regret
Exp3
Exp3-IX
Figure 12.1Comparison between Exp3 and Exp3-IX on Bernoulli bandit
12.4(Expected regret of Exp3-IX) In this exercise you will complete the
steps explained in Note 1 to prove a bound on the expected regret of Exp3-IX.
(a) Find a choice ofηand universal constantC >0 such that
Rn≤C
√
knlog(k).