Bandit Algorithms

(Jeff_L) #1
4.10 Exercises 70

200 400 600 800 1 , 000

10

20

30

40

50

n

Expected Regret

Follow-the-Leader

Figure 4.3The regret for Follow-the-Leader over 1000 trials on Bernoulli bandit with
meansμ 1 = 0. 5 ,μ 2 = 0.6 and horizons ranging fromn= 100 ton= 1000.


(c)Explain the plot. Do you think Follow-the-Leader is a good algorithm?
Why/why not?

Free download pdf