Bandit Algorithms

(Jeff_L) #1
28.2 Regret analysis 312

which may be a different choice than mirror descent.
Example28.3.The exponential weights algorithm that appeared in various
forms on numerous occasions in earlier chapters is a special case of mirror descent
corresponding to choosing the constraint setAas the simplex inRdand choosing
F to be the unnormalized negentropy function of Example 26.5. In this case
follow the regularized leader chooses

at+1= argmina∈Aη

∑t

s=1

〈a,ys〉+

∑d

i=1

ailog(ai)−ai.

You will show in Exercise 28.6 that

at+1,i=

exp

(


−η

∑t
s=1ysi

)


∑d
j=1exp

(


−η

∑t
s=1ysj

). (28.5)


28.1.1 A two-step process for implementation


Solving the optimization problem in Eq. (28.2) is often made easier by using the
results in Section 26.6 of Chapter 26. LetD∗=∇F(D) and suppose the following
condition holds:
∇F(x)−ηy∈D∗for allx∈A∩Dandy∈L. (28.6)
Then the solution to Eq. (28.2) can be found using the following two-step
procedure:
̃at+1= argmina∈Dη〈a,yt〉+DF(a,at) and (28.7)
at+1= argmina∈A∩DDF(a, ̃at+1). (28.8)
Eq. (28.6) means the first optimization problem can be evaluated explicitly as
the solution to
ηyt+∇F( ̃at+1)−∇F(at) = 0. (28.9)
SinceF is Legendre this means that ̃at+1 = (∇F)−^1 (∇F(at)−ηyt). The
optimization problem in Eq. (28.8) is usually harder to calculate analytically, but
there are important exceptions as we shall see.

The condition in Eq. (28.6) holds for all choices of potential and losses in
this book.

28.2 Regret analysis


Although mirror descent and follow the regularized leader are not the same, the
bounds presented here are identical. The theorem for mirror descent has two
Free download pdf