Bandit Algorithms

28.2 Regret analysis 312

which may be a different choice than mirror descent. Example28.3.The exponential weights algorithm that appeared in various forms on numerous occasions in earlier chapters is a special case of mirror descent corresponding to choosing the constraint setAas the simplex inRdand choosing F to be the unnormalized negentropy function of Example 26.5. In this case follow the regularized leader chooses

at+1= argmina∈Aη

∑t

s=1

〈a,ys〉+

∑d

i=1

ailog(ai)−ai.

You will show in Exercise 28.6 that

at+1,i=

exp

(

−η

∑t s=1ysi

)

∑d j=1exp

(

−η

∑t s=1ysj

). (28.5)

28.1.1 A two-step process for implementation

Solving the optimization problem in Eq. (28.2) is often made easier by using the results in Section 26.6 of Chapter 26. LetD∗=∇F(D) and suppose the following condition holds: ∇F(x)−ηy∈D∗for allx∈A∩Dandy∈L. (28.6) Then the solution to Eq. (28.2) can be found using the following two-step procedure: ̃at+1= argmina∈Dη〈a,yt〉+DF(a,at) and (28.7) at+1= argmina∈A∩DDF(a, ̃at+1). (28.8) Eq. (28.6) means the first optimization problem can be evaluated explicitly as the solution to ηyt+∇F( ̃at+1)−∇F(at) = 0. (28.9) SinceF is Legendre this means that ̃at+1 = (∇F)−^1 (∇F(at)−ηyt). The optimization problem in Eq. (28.8) is usually harder to calculate analytically, but there are important exceptions as we shall see.

The condition in Eq. (28.6) holds for all choices of potential and losses in this book.

28.2 Regret analysis

Although mirror descent and follow the regularized leader are not the same, the bounds presented here are identical. The theorem for mirror descent has two

Bandit Algorithms

(

)

(

). (28.5)

28.2 Regret analysis

Get our desktop app

Company

Features

Documentation

Resources