12.5 Exercises 167
(c) Show that with probability at least 1−δ,
(B)≤ 2
∑n
t=1
∑k
a=1
Ptaβta+
log(1/δ)
η
.
(d) Show that with probability at least 1−kδ,
(C)≤
log(1/δ)
η
.
(e) Conclude that for anyδ≤ 1 /(k+ 1), with probability at least 1−(k+ 1)δ,
Rn≤
3 log(1/δ)
η
+η
∑n
t=1
∑k
a=1
PtaZˆ^2 ta+ 5
∑n
t=1
∑k
a=1
Ptaβta.
Hint This is a long and challenging exercise. You may find it helpful to use
the result in Exercise 5.15. The solution is also available.
12.3(Implementation) Consider the Bernoulli bandit withk= 5 arms and
n= 10^4 with meansμ 1 = 1/2 andμi= 1/ 2 −∆ fori >1. Plot the regret of Exp3
and Exp3-IX for ∆∈[0, 1 /2]. You should get something similar to the graph in
Fig. 12.1. Does the result surprise you? Repeat the experiments in Exercise 11.5
with Exp3-IX and convince yourself that this algorithm is more robust than
Exp3.
(^10000). 1 0. 2 0. 3 0. 4 0. 5
150
200
250
300
∆
Regret
Exp3
Exp3-IX
Figure 12.1Comparison between Exp3 and Exp3-IX on Bernoulli bandit
12.4(Expected regret of Exp3-IX) In this exercise you will complete the
steps explained in Note 1 to prove a bound on the expected regret of Exp3-IX.
(a) Find a choice ofηand universal constantC >0 such that
Rn≤C
√
knlog(k).