Bandit Algorithms

28.2 Regret analysis 314

fora 1 = argminb∈AF(b). To see the second part note that

〈at−at+1,yt〉=

1

η〈at−at+1,∇F(at)−∇F( ̃at+1)〉 =^1 η

(D(at+1,at) +D(at, ̃at+1)−D(at+1, ̃at+1))

≤

1

η

(D(at+1,at) +D(at, ̃at+1)).

The result follows by substituting this into Eq. (28.10) and completing as for the first part.

The assumption thata 1 minimizes the potential was only used to bound D(a,a 1 )≤F(a)−F(a 1 ). For a different initialization the following bound still holds:

Rn(a)≤

1

η

(

D(a,a 1 ) +

∑n

t=1

D(at,a ̃t+1)

)

. (28.11)

As we shall see in Chapter 31, this is useful when using mirror descent to analyze nonstationary bandits.

The first part of Theorem 28.4 also holds for follow the regularized leader as stated in the next result, the proof of which is left for Exercise 28.3.

Theorem28.5. Letη > 0 andF be Legendre with domainDandA ⊆cl(D).
Then for anya∈Athe regret of fol low the regularized leader is bounded by

Rn(a)≤

F(a)−F(a 1 ) η

+

∑n

t=1

〈at−at+1,yt〉−

1

η

∑n

t=1

D(at+1,at).

We now give two applications of Theorem 28.4. Let diamF(A) = maxa,b∈AF(a)−F(b) be the diameter ofAwith respect toF.

Proposition28.6.LetA=Bd 2 ={a∈Rd:‖a‖ 2 ≤ 1 }be the standard unit bal l and assumeyt∈Bd 2 for allt. Then mirror descent with potentialF(a) =^12 ‖a‖^22 andη=

√

1 /nsatisfiesRn≤

√

n.

Proof By Eq. (28.9) we have ̃at+1=at−ηytso

D(at, ̃at+1) =

1

2

‖ ̃at+1−at‖^22 =

η^2 2

‖yt‖^22.

Therefore since diamF(A) = 1/2 and‖yt‖ 2 ≤1 for allt,

Rn≤

diamF(A) η +

η 2

∑n

t=1

‖yt‖^22 ≤

1

2 η+

ηn 2 =

√

n.

Proposition28.7.LetA=Pd− 1 be the probability simplex andyt∈L= [0,1]d

Bandit Algorithms

1

1

1

(

)

. (28.11)

+

1

√

√

1

2

1

√

Get our desktop app

Company

Features

Documentation

Resources