28.7 Exercises 325(b) Show that ifFt=F/ηtand (ηt)nt=1+1is decreasing withηn=ηn+1, then
Rn(a)≤F(a)−minb∈AF(b)
ηn+
∑nt=1(
〈at−at+1,yt〉−DF(at+1,at)
ηt)
28.11(Anytime version of Exp3) Consider thek-armed adversarial bandit
problem described in Chapter 11 where the adversary chooses (yt)nt=1 with
yt∈[0,1]k. LetPt∈Pk− 1 be defined byPti=exp(
−ηt∑t− 1
s=1Yˆsi)
∑k
j=1exp(
−ηt∑t− 1
s=1Yˆsj),
where (ηt)∞t=1is an infinite sequence of learning rates andYˆti=I{At=i}yti/Pti
andAtis sampled fromPt.
(a)LetA=Pk− 1 be the simplex,Fbe the unnormalized negentropy potential,
Ft(p) =F(p)/ηtand Φt(p) =F(p)/ηt+
∑t− 1
s=1〈p,Yˆs〉. Show thatPtis the
choice of follow the regularized leader with potentials (Ft)nt=1and losses
(Yˆt)nt=1.(b) Assume that (ηt)nt=1are decreasing and use Exercise 28.10 to show that
Rn≤log(k)
ηn+E
[n
∑t=1〈Pt−Pt+1,Yˆt〉−DF(Pt+1,Pt)
ηt]
.
(c)Use Theorem 26.12 in combination with the facts thatYˆti≥0 for alliand
Yˆti= 0 unlessAt=ito show that〈Pt−Pt+1,Yˆt〉−DF(Pt+1,Pt)
ηt≤
ηt
2 PtAt.
(d) Prove thatRn≤
log(k)
ηn+
k
2∑nt=1ηt.(e) Choose (ηt)∞t=1so thatRn≤ 2√
nklog(k) for alln≥1.28.12(The log barrier and first order bounds) Let (yt)nt=1be a sequence
of loss vectors withyt∈[0,1]kfor alltandF(a) =−∑k
i=1log(ai). Consider the
instance of FTRL for bandits that samplesAtfromPtdefined byPt= argminp∈Pk− 1 ηt∑t−^1s=1〈p,Yˆs〉+F(p).(a) Show that for appropriately tuned (ηt)nt=1,
Rn≤k+ 2√√
√√
k(
1 +E
[n− 1
∑t=1y^2 tAt])
log(
n∨k
k