37.6 Upper bound for easy games 466
1 −δ,
Rˆn≤Rˆ′n+γn+√ 2 nlog(1/δ). (37.15)
By Lemma 37.18, the surrogate regret is bounded in terms of the local regret:
Rˆ′n=
∑n
t=1
〈`Bt−`a∗n,ut〉≤
1
εG
maxφ∈H
∑n
t=1
〈`Bt−`φ(Bt),ut〉. (37.16)
We prepare to use Hoeffding-Azuma again. Fixφ∈Harbitrarily. Then,
Et− 1
[
〈`Bt−`φ(Bt),ut〉
]
=
∑
j∈A
P ̃tj
∑
a∈A
Qtja〈`a−`φ(j),ut〉
=
∑
j∈A
P ̃tj
∑
a∈A
Qtja(yta−ytφ(j)),
where we used the fact thatP ̃ta=
∑
jP ̃tjQtja. Hoeffding-Azuma’s inequality
now shows that with probability at least 1−δ/|H|,
∑n
t=1
〈`Bt−`φ(Bt),ut〉≤
∑
j∈A
∑n
t=1
P ̃tj
∑
a∈A
Qtja(yta−ytφ(j)) +
√
2 nlog(|H|/δ).
The result is completed via a union bound over allφ∈ Hand chaining with
Eqs. (37.15) and (37.16), and noting that
maxφ∈H
∑
j∈A
∑n
t=1
P ̃tj
∑
a∈A
Qtja(yta−ytφ(j))≤
∑
j∈A
maxφ∈H
∑n
t=1
P ̃tj
∑
a∈A
Qtja(yta−ytφ(j))
=
∑
j∈A
max
b∈Nj∩A
∑n
t=1
P ̃tj
∑
a∈A
Qtja(yta−ytb)
︸ ︷︷ ︸
Rˆnj
. (37.17)
Proof of Theorem 37.15 The proof has two steps: Bounding the local regretRˆnj
for eachj∈A, and then merging the bounds.
Step 1: Bounding the local regret
For this step fixj∈ A. The local regret can be massaged into a form suitable
for the application of the result of Exercise 12.2. LetZtja =P ̃tj(yta−ytj)
andGt=σ(A 1 ,...,At) andG= (Gt)nt=0be the associated filtration. A simple
rewriting shows that
Rˆnj= max
b∈Nj∩A
∑n
t=1
P ̃tj
∑
a∈A
Qtja(yta−ytb) = max
b∈Nj∩A
∑n
t=1
∑
a∈A
Qtja(Ztja−Ztjb).
In order to apply the result in Exercise 12.2 we need to check the conditions.
Since (Pt)tand (P ̃t)tareG-predictable it follows that (βt)tand (Zt)tare alsoG-
predictable. Similarly, (Zˆt)tisG-adapted because (At)tand (Φt)tareG-adapted.
It remains to show that assumptions(a–d)are satisfied. For(a)leta∈Nj∩A.