37.6 Upper bound for easy games 4661 −δ,
Rˆn≤Rˆ′n+γn+√ 2 nlog(1/δ). (37.15)
By Lemma 37.18, the surrogate regret is bounded in terms of the local regret:Rˆ′n=∑nt=1〈`Bt−`a∗n,ut〉≤1
εGmaxφ∈H∑nt=1〈`Bt−`φ(Bt),ut〉. (37.16)We prepare to use Hoeffding-Azuma again. Fixφ∈Harbitrarily. Then,
Et− 1[
〈`Bt−`φ(Bt),ut〉]
=
∑
j∈AP ̃tj∑
a∈AQtja〈`a−`φ(j),ut〉=
∑
j∈AP ̃tj∑
a∈AQtja(yta−ytφ(j)),where we used the fact thatP ̃ta=
∑
jP ̃tjQtja. Hoeffding-Azuma’s inequality
now shows that with probability at least 1−δ/|H|,
∑nt=1〈`Bt−`φ(Bt),ut〉≤∑
j∈A∑nt=1P ̃tj∑
a∈AQtja(yta−ytφ(j)) +√
2 nlog(|H|/δ).The result is completed via a union bound over allφ∈ Hand chaining with
Eqs. (37.15) and (37.16), and noting that
maxφ∈H∑
j∈A∑nt=1P ̃tj∑
a∈AQtja(yta−ytφ(j))≤∑
j∈Amaxφ∈H∑nt=1P ̃tj∑
a∈AQtja(yta−ytφ(j))=
∑
j∈Amax
b∈Nj∩A∑nt=1P ̃tj∑
a∈AQtja(yta−ytb)
︸ ︷︷ ︸
Rˆnj. (37.17)
Proof of Theorem 37.15 The proof has two steps: Bounding the local regretRˆnj
for eachj∈A, and then merging the bounds.Step 1: Bounding the local regret
For this step fixj∈ A. The local regret can be massaged into a form suitable
for the application of the result of Exercise 12.2. LetZtja =P ̃tj(yta−ytj)
andGt=σ(A 1 ,...,At) andG= (Gt)nt=0be the associated filtration. A simple
rewriting shows thatRˆnj= max
b∈Nj∩A∑nt=1P ̃tj∑
a∈AQtja(yta−ytb) = max
b∈Nj∩A∑nt=1∑
a∈AQtja(Ztja−Ztjb).In order to apply the result in Exercise 12.2 we need to check the conditions.
Since (Pt)tand (P ̃t)tareG-predictable it follows that (βt)tand (Zt)tare alsoG-
predictable. Similarly, (Zˆt)tisG-adapted because (At)tand (Φt)tareG-adapted.
It remains to show that assumptions(a–d)are satisfied. For(a)leta∈Nj∩A.