Bandit Algorithms

6.4 Exercises 96

0 100 200 300 400

50

60

70

m

Expected regret

Explore-Then-Commit

Figure 6.2Expected regret for Explore-Then-Commit over 10^5 trials on a Gaussian bandit with meansμ 1 = 0,μ 2 =− 1 / 10

0 100 200 300 400

40

60

80

100

m

Standard deviation of the regret

Explore-Then-Commit

Figure 6.3Standard deviation of the regret for ETC over 10^5 trials on a Gaussian bandit
with meansμ 1 = 0,μ 2 =− 1 / 10