Bandit Algorithms

(Jeff_L) #1
6.4 Exercises 95

(e)Modify your choice ofm`and show that the regret of the resulting algorithm
satisfies

Rn≤C


i:∆i> 0

(


∆i+

1


∆ilog max

{


e,n∆^2 i

})


.


Algorithm 2 is due to Auer and Ortner [2010].

6.9(Empirical study) In this exercise you will investigate the empirical behavior
of ETC on a two-armed Gaussian bandit with meansμ 1 = 0 andμ 2 =−∆. Let

R ̄n=

∑n

t=1

∆At,

which is chosen so thatRn=E[R ̄n]. Complete the following:


(a)Using programming language of your choice, write a function that accepts an
integernand ∆>0 and returns the value ofmthat exactly minimizes the
expected regret.
(b) Reproduce Fig. 6.1.
(c)Now fix ∆ = 1/10 and plot the expected regret as a function ofmwith
n= 2000. Your plot should resemble Fig. 6.2.
(d)Plot the standard deviationV[R ̄n]^1 /^2 as a function ofmfor the same bandit
as above. Your plot should resemble Fig. 6.3.
(e)Explain the shape of the curves you observed in Parts (b), (c) and (d) and
reconcile what you see with the theoretical results.
(f)Think, experiment and plot. Is it justified to plotV[R ̄n]^1 /^2 as a summary of
howR ̄nis distributed? Explain your thinking.

Free download pdf