28.7 Exercises 325
(b) Show that ifFt=F/ηtand (ηt)nt=1+1is decreasing withηn=ηn+1, then
Rn(a)≤F(a)−minb∈AF(b)
ηn
+
∑n
t=1
(
〈at−at+1,yt〉−DF(at+1,at)
ηt
)
28.11(Anytime version of Exp3) Consider thek-armed adversarial bandit
problem described in Chapter 11 where the adversary chooses (yt)nt=1 with
yt∈[0,1]k. LetPt∈Pk− 1 be defined by
Pti=
exp
(
−ηt
∑t− 1
s=1Yˆsi
)
∑k
j=1exp
(
−ηt
∑t− 1
s=1Yˆsj
),
where (ηt)∞t=1is an infinite sequence of learning rates andYˆti=I{At=i}yti/Pti
andAtis sampled fromPt.
(a)LetA=Pk− 1 be the simplex,Fbe the unnormalized negentropy potential,
Ft(p) =F(p)/ηtand Φt(p) =F(p)/ηt+
∑t− 1
s=1〈p,Yˆs〉. Show thatPtis the
choice of follow the regularized leader with potentials (Ft)nt=1and losses
(Yˆt)nt=1.
(b) Assume that (ηt)nt=1are decreasing and use Exercise 28.10 to show that
Rn≤
log(k)
ηn
+E
[n
∑
t=1
〈Pt−Pt+1,Yˆt〉−
DF(Pt+1,Pt)
ηt
]
.
(c)Use Theorem 26.12 in combination with the facts thatYˆti≥0 for alliand
Yˆti= 0 unlessAt=ito show that
〈Pt−Pt+1,Yˆt〉−
DF(Pt+1,Pt)
ηt
≤
ηt
2 PtAt
.
(d) Prove thatRn≤
log(k)
ηn
+
k
2
∑n
t=1
ηt.
(e) Choose (ηt)∞t=1so thatRn≤ 2
√
nklog(k) for alln≥1.
28.12(The log barrier and first order bounds) Let (yt)nt=1be a sequence
of loss vectors withyt∈[0,1]kfor alltandF(a) =−
∑k
i=1log(ai). Consider the
instance of FTRL for bandits that samplesAtfromPtdefined by
Pt= argminp∈Pk− 1 ηt
∑t−^1
s=1
〈p,Yˆs〉+F(p).
(a) Show that for appropriately tuned (ηt)nt=1,
Rn≤k+ 2
√√
√√
k
(
1 +E
[n− 1
∑
t=1
y^2 tAt
])
log
(
n∨k
k