Bandit Algorithms

(Jeff_L) #1
12.5 Exercises 167

(c) Show that with probability at least 1−δ,

(B)≤ 2


∑n

t=1

∑k

a=1

Ptaβta+

log(1/δ)
η

.


(d) Show that with probability at least 1−kδ,


(C)≤

log(1/δ)
η

.


(e) Conclude that for anyδ≤ 1 /(k+ 1), with probability at least 1−(k+ 1)δ,

Rn≤

3 log(1/δ)
η


∑n

t=1

∑k

a=1

PtaZˆ^2 ta+ 5

∑n

t=1

∑k

a=1

Ptaβta.

Hint This is a long and challenging exercise. You may find it helpful to use
the result in Exercise 5.15. The solution is also available.

12.3(Implementation) Consider the Bernoulli bandit withk= 5 arms and
n= 10^4 with meansμ 1 = 1/2 andμi= 1/ 2 −∆ fori >1. Plot the regret of Exp3
and Exp3-IX for ∆∈[0, 1 /2]. You should get something similar to the graph in
Fig. 12.1. Does the result surprise you? Repeat the experiments in Exercise 11.5
with Exp3-IX and convince yourself that this algorithm is more robust than
Exp3.


(^10000). 1 0. 2 0. 3 0. 4 0. 5
150
200
250
300

Regret
Exp3
Exp3-IX
Figure 12.1Comparison between Exp3 and Exp3-IX on Bernoulli bandit
12.4(Expected regret of Exp3-IX) In this exercise you will complete the
steps explained in Note 1 to prove a bound on the expected regret of Exp3-IX.
(a) Find a choice ofηand universal constantC >0 such that
Rn≤C



knlog(k).
Free download pdf