Bandit Algorithms

(Jeff_L) #1
28.7 Exercises 324

(b)What happens if you replace mirror descent by follow the regularized leader?


Pt+1= argminp∈A

∑t

s=1

〈p,Yˆs〉+F(p).

28.8(Exp3 as mirror descent (ii)) Here you will show that the tools in this
chapter not only lead to the same algorithm, but also the same bounds.

(a)LetP ̃t+1=argminp∈[0,∞)kη〈p,Yˆt〉+DF(p,Pt). Show both relations in the
following display:


DF(Pt,P ̃t+1) =

∑k

i=1

Pti

(


exp(−ηYˆti)−1 +ηYˆti

)



η^2
2

∑k

i=1

PtiYˆti^2.

(b) Show that^1
η


E


[n

t=1

DF(Pt,P ̃t+1)

]


≤ηnk
2

.


(c) Show that diamF(Pk− 1 ) = log(k).
(d) Conclude that for appropriately tunedη >0 the regret of Exp3 satisfies,


Rn≤


2 nklog(k).

28.9(Mirror descent and changing learning rates) LetAbe closed and
convex andy 1 ,...,yn∈L⊆Rd. LetFbe Legendre with domainDand assume
thatA ⊆cl(D) and that Eq. (28.6) holds. Letη 0 ,η 1 ,...,ηnbe a sequence of
learning rates where we assume thatη 0 =∞anda 1 = argmina∈AF(a) and

̃at+1= argmina∈Dηt〈a,yt〉+DF(a,at),
at+1= argmina∈A∩DDF(a, ̃at+1).

Show that for alla∈A:

(a)Rn(a) =


∑n

t=1

〈at−a,yt〉≤

∑n

t=1

DF(at, ̃at+1)
ηt

+


∑n

t=1

DF(a,at)−DF(a, ̃at+1)
ηt

.


(b)Rn(a)≤


∑n

t=1

DF(at, ̃at+1)
ηt

+


∑n

t=1

DF(a,at)

(


1


ηt


1


ηt− 1

)


.


28.10(FTRL and changing potentials) Like in the previous exercise, let
Abe closed and convex andy 1 ,...,yn∈ L ⊆Rd. LetF 1 ,...,Fn,Fn+1be a
sequence of Legendre functions wheredom(Ft) =DtandA⊆cl(Dt) for allt. Let
Φt(a) =Ft(a) +


∑t− 1
s=1〈a,ys〉andat= argmina∈AΦt(a).

(a) Show that


Rn(a)≤

∑n

t=1

(〈at−at+1,yt〉−DFt(at+1,at))

+Fn+1(a)−F 1 (a 1 ) +

∑n

t=1

(Ft(at+1)−Ft+1(at+1)).
Free download pdf