Bandit Algorithms

28.7 Exercises 324

(b)What happens if you replace mirror descent by follow the regularized leader?

Pt+1= argminp∈A

∑t

s=1

〈p,Yˆs〉+F(p).

28.8(Exp3 as mirror descent (ii)) Here you will show that the tools in this chapter not only lead to the same algorithm, but also the same bounds.

(a)LetP ̃t+1=argminp∈[0,∞)kη〈p,Yˆt〉+DF(p,Pt). Show both relations in the
following display:

DF(Pt,P ̃t+1) =

∑k

i=1

Pti

(

exp(−ηYˆti)−1 +ηYˆti

)

≤

η^2 2

∑k

i=1

PtiYˆti^2.

(b) Show that^1
η

E

[n ∑

t=1

DF(Pt,P ̃t+1)

]

≤ηnk 2

.

(c) Show that diamF(Pk− 1 ) = log(k).
(d) Conclude that for appropriately tunedη >0 the regret of Exp3 satisfies,

Rn≤

√

2 nklog(k).

28.9(Mirror descent and changing learning rates) LetAbe closed and convex andy 1 ,...,yn∈L⊆Rd. LetFbe Legendre with domainDand assume thatA ⊆cl(D) and that Eq. (28.6) holds. Letη 0 ,η 1 ,...,ηnbe a sequence of learning rates where we assume thatη 0 =∞anda 1 = argmina∈AF(a) and

̃at+1= argmina∈Dηt〈a,yt〉+DF(a,at), at+1= argmina∈A∩DDF(a, ̃at+1).

Show that for alla∈A:

(a)Rn(a) =

∑n

t=1

〈at−a,yt〉≤

∑n

t=1

DF(at, ̃at+1) ηt

+

∑n

t=1

DF(a,at)−DF(a, ̃at+1) ηt

.

(b)Rn(a)≤

∑n

t=1

DF(at, ̃at+1) ηt

+

∑n

t=1

DF(a,at)

(

1

ηt

−

1

ηt− 1

)

.

28.10(FTRL and changing potentials) Like in the previous exercise, let
Abe closed and convex andy 1 ,...,yn∈ L ⊆Rd. LetF 1 ,...,Fn,Fn+1be a
sequence of Legendre functions wheredom(Ft) =DtandA⊆cl(Dt) for allt. Let
Φt(a) =Ft(a) +

∑t− 1 s=1〈a,ys〉andat= argmina∈AΦt(a).

(a) Show that

Rn(a)≤

∑n

t=1

(〈at−at+1,yt〉−DFt(at+1,at))

+Fn+1(a)−F 1 (a 1 ) +

∑n

t=1

(Ft(at+1)−Ft+1(at+1)).

Bandit Algorithms

(

)

≤

E

]

.

√

+

.

+

(

1

−

1

)

.

Get our desktop app

Company

Features

Documentation

Resources