Bandit Algorithms

28.7 Exercises 325

(b) Show that ifFt=F/ηtand (ηt)nt=1+1is decreasing withηn=ηn+1, then

Rn(a)≤F(a)−minb∈AF(b) ηn

+

∑n

t=1

(

〈at−at+1,yt〉−DF(at+1,at) ηt

)

28.11(Anytime version of Exp3) Consider thek-armed adversarial bandit problem described in Chapter 11 where the adversary chooses (yt)nt=1 with yt∈[0,1]k. LetPt∈Pk− 1 be defined by

Pti=

exp

(

−ηt

∑t− 1 s=1Yˆsi

)

∑k j=1exp

(

−ηt

∑t− 1 s=1Yˆsj

),

where (ηt)∞t=1is an infinite sequence of learning rates andYˆti=I{At=i}yti/Pti
andAtis sampled fromPt.

(a)LetA=Pk− 1 be the simplex,Fbe the unnormalized negentropy potential,
Ft(p) =F(p)/ηtand Φt(p) =F(p)/ηt+

∑t− 1 s=1〈p,Yˆs〉. Show thatPtis the choice of follow the regularized leader with potentials (Ft)nt=1and losses (Yˆt)nt=1.

(b) Assume that (ηt)nt=1are decreasing and use Exercise 28.10 to show that

Rn≤

log(k) ηn

+E

[n ∑

t=1

〈Pt−Pt+1,Yˆt〉−

DF(Pt+1,Pt) ηt

]

.

(c)Use Theorem 26.12 in combination with the facts thatYˆti≥0 for alliand Yˆti= 0 unlessAt=ito show that

〈Pt−Pt+1,Yˆt〉−

DF(Pt+1,Pt) ηt

≤

ηt 2 PtAt

.

(d) Prove thatRn≤

log(k) ηn

+

k 2

∑n

t=1

ηt.

(e) Choose (ηt)∞t=1so thatRn≤ 2

√

nklog(k) for alln≥1.

28.12(The log barrier and first order bounds) Let (yt)nt=1be a sequence of loss vectors withyt∈[0,1]kfor alltandF(a) =−

∑k i=1log(ai). Consider the instance of FTRL for bandits that samplesAtfromPtdefined by

Pt= argminp∈Pk− 1 ηt

∑t−^1

s=1

〈p,Yˆs〉+F(p).

(a) Show that for appropriately tuned (ηt)nt=1,

Rn≤k+ 2

√√

k

(

1 +E

[n− 1 ∑

t=1

y^2 tAt

])

log

(

n∨k k

)

. (28.14)

Bandit Algorithms

+

(

)

(

)

(

),

+E

]

.

≤

.

+

√

√√

√√

(

1 +E

])

(

)

. (28.14)

Get our desktop app

Company

Features

Documentation

Resources