37.6 Upper bound for easy games 465
twice that〈`a−`b,w〉= 0, we calculate
〈`a−`b,u〉=〈`a−`b,u−w〉=
‖u−w‖ 2
‖v−w‖ 2 〈`a−`b,w−v〉
=
‖u−w‖ 2
‖v−w‖ 2
〈`b−`a,v〉> 0 , (37.14)
where the second equality used thatw 6 =vis a point of the line segment connecting
vandu, hencew−vandu−ware parallel and share the same direction and
‖v−w‖ 2 >0 (see Fig. 37.4), and the last inequality follows becausevis the
centroid ofCaanda,bare distinct Pareto optimal actions.
Letvcbe the centroid ofCcfor anyc∈A. Then,
〈`a−` ̃a,u〉
〈`a−`b,u〉=
〈`a−`a ̃,w+u−w〉
〈`a−`b,u〉
(a)
≤
〈`a−`b,w〉+〈`a−`a ̃,u−w〉
〈`a−`b,u〉
(b)
=
〈`a−` ̃a,u−w〉
〈`a−`b,u〉
(c)
=
‖v−w‖ 2 〈`a−` ̃a,u−w〉
‖u−w‖ 2 〈`b−`a,v〉
(d)
≤
‖v−w‖ 2 ‖`a−` ̃a‖ 2
〈`b−`a,v〉
(e)
≤
√
2 d
minc∈Amine∈Nc\{c}〈`e−`c,vc〉
=
1
εG
,
where (a) follows since by(37.14),〈a−
b,u〉>0 and also becausew∈Cb
implies that〈a−
̃a,w〉 ≤ 〈a−
b,w〉, (b) follows since〈a−
b,w〉= 0,
which is used in other steps as well. (c) uses(37.14), (d) is by Cauchy-Schwarz
and in (e) we bounded‖w−v‖ 2 ≤
√
2 and used that‖`a−` ̃a‖ 2 ≤
√
dand
〈`b−`a,v〉=〈`b−`a,va〉 ≥minc∈Amine∈Nc\{c}〈`e−`c,vc〉>0. The final
equality serves as the definition of 1/εG.
Lemma37.18.LetHbe the set of functionsφ:A→Awithφ(a)∈Na∩Afor
alla∈ Aand definea∗n=argmina∈[k]
∑n
t=1〈`a,ut〉. Then for any sequence of
actions(Bt)nt=1inA,
∑n
t=1
〈`Bt−`a∗n,ut〉≤^1
εG
max
φ∈H
∑n
t=1
〈`Bt−`φ(Bt),ut〉.
Proof With no loss of generality, we can assume thata∗n∈ AbecauseAis a
maximal duplicate-free subset of Pareto optimal actions. Apply the previous
lemma on subsequences of rounds whereBt=afor eacha∈A.
Lemma37.19.Letδ∈(0,1). Then with probability at least 1 − 2 δit holds that
Rˆn≤γn+^1
εG
∑
j∈A
max
b∈Nj∩A
∑n
t=1
P ̃tj
∑
a∈A
Qtja(yta−ytb) +
√
8 nlog(|H|/δ).
Proof Fort∈[n], letBt∼P ̃t. Define the surrogate regretRˆ′n=
∑n
t=1〈`Bt−
`a∗n,ut〉. By the definition ofAtandBtand Lemma 37.14 (b) we haveEt− 1 [〈`At−
`Bt,ut〉]≤γ, whereEt− 1 [·] =E[·|A 1 ,B 1 ,...,At− 1 ,Bt− 1 ]. Furthermore,|〈`a−
`b,ut〉|≤1 for alla,b. Therefore, by Hoeffding–Azuma, with probability at least