Bandit Algorithms

(Jeff_L) #1
28.7 Exercises 325

(b) Show that ifFt=F/ηtand (ηt)nt=1+1is decreasing withηn=ηn+1, then


Rn(a)≤F(a)−minb∈AF(b)
ηn

+


∑n

t=1

(


〈at−at+1,yt〉−DF(at+1,at)
ηt

)


28.11(Anytime version of Exp3) Consider thek-armed adversarial bandit
problem described in Chapter 11 where the adversary chooses (yt)nt=1 with
yt∈[0,1]k. LetPt∈Pk− 1 be defined by

Pti=

exp

(


−ηt

∑t− 1
s=1Yˆsi

)


∑k
j=1exp

(


−ηt

∑t− 1
s=1Yˆsj

),


where (ηt)∞t=1is an infinite sequence of learning rates andYˆti=I{At=i}yti/Pti
andAtis sampled fromPt.


(a)LetA=Pk− 1 be the simplex,Fbe the unnormalized negentropy potential,
Ft(p) =F(p)/ηtand Φt(p) =F(p)/ηt+


∑t− 1
s=1〈p,Yˆs〉. Show thatPtis the
choice of follow the regularized leader with potentials (Ft)nt=1and losses
(Yˆt)nt=1.

(b) Assume that (ηt)nt=1are decreasing and use Exercise 28.10 to show that


Rn≤

log(k)
ηn

+E


[n

t=1

〈Pt−Pt+1,Yˆt〉−

DF(Pt+1,Pt)
ηt

]


.


(c)Use Theorem 26.12 in combination with the facts thatYˆti≥0 for alliand
Yˆti= 0 unlessAt=ito show that

〈Pt−Pt+1,Yˆt〉−

DF(Pt+1,Pt)
ηt


ηt
2 PtAt

.


(d) Prove thatRn≤


log(k)
ηn

+


k
2

∑n

t=1

ηt.

(e) Choose (ηt)∞t=1so thatRn≤ 2


nklog(k) for alln≥1.

28.12(The log barrier and first order bounds) Let (yt)nt=1be a sequence
of loss vectors withyt∈[0,1]kfor alltandF(a) =−

∑k
i=1log(ai). Consider the
instance of FTRL for bandits that samplesAtfromPtdefined by

Pt= argminp∈Pk− 1 ηt

∑t−^1

s=1

〈p,Yˆs〉+F(p).

(a) Show that for appropriately tuned (ηt)nt=1,


Rn≤k+ 2

√√


√√


k

(


1 +E


[n− 1

t=1

y^2 tAt

])


log

(


n∨k
k

)


. (28.14)

Free download pdf