37.6 Upper bound for easy games 465twice that〈`a−`b,w〉= 0, we calculate〈`a−`b,u〉=〈`a−`b,u−w〉=‖u−w‖ 2
‖v−w‖ 2 〈`a−`b,w−v〉=‖u−w‖ 2
‖v−w‖ 2〈`b−`a,v〉> 0 , (37.14)where the second equality used thatw 6 =vis a point of the line segment connecting
vandu, hencew−vandu−ware parallel and share the same direction and
‖v−w‖ 2 >0 (see Fig. 37.4), and the last inequality follows becausevis the
centroid ofCaanda,bare distinct Pareto optimal actions.
Letvcbe the centroid ofCcfor anyc∈A. Then,
〈`a−` ̃a,u〉
〈`a−`b,u〉=〈`a−`a ̃,w+u−w〉
〈`a−`b,u〉(a)
≤〈`a−`b,w〉+〈`a−`a ̃,u−w〉
〈`a−`b,u〉
(b)
=〈`a−` ̃a,u−w〉
〈`a−`b,u〉(c)
=‖v−w‖ 2 〈`a−` ̃a,u−w〉
‖u−w‖ 2 〈`b−`a,v〉
(d)
≤‖v−w‖ 2 ‖`a−` ̃a‖ 2
〈`b−`a,v〉(e)
≤√
2 d
minc∈Amine∈Nc\{c}〈`e−`c,vc〉=
1
εG,
where (a) follows since by(37.14),〈a−b,u〉>0 and also becausew∈Cb
implies that〈a− ̃a,w〉 ≤ 〈a−b,w〉, (b) follows since〈a−b,w〉= 0,
which is used in other steps as well. (c) uses(37.14), (d) is by Cauchy-Schwarz
and in (e) we bounded‖w−v‖ 2 ≤
√
2 and used that‖`a−` ̃a‖ 2 ≤√
dand
〈`b−`a,v〉=〈`b−`a,va〉 ≥minc∈Amine∈Nc\{c}〈`e−`c,vc〉>0. The final
equality serves as the definition of 1/εG.Lemma37.18.LetHbe the set of functionsφ:A→Awithφ(a)∈Na∩Afor
alla∈ Aand definea∗n=argmina∈[k]
∑n
t=1〈`a,ut〉. Then for any sequence of
actions(Bt)nt=1inA,
∑nt=1〈`Bt−`a∗n,ut〉≤^1
εGmax
φ∈H∑nt=1〈`Bt−`φ(Bt),ut〉.Proof With no loss of generality, we can assume thata∗n∈ AbecauseAis a
maximal duplicate-free subset of Pareto optimal actions. Apply the previous
lemma on subsequences of rounds whereBt=afor eacha∈A.Lemma37.19.Letδ∈(0,1). Then with probability at least 1 − 2 δit holds thatRˆn≤γn+^1
εG∑
j∈Amax
b∈Nj∩A∑nt=1P ̃tj∑
a∈AQtja(yta−ytb) +√
8 nlog(|H|/δ).Proof Fort∈[n], letBt∼P ̃t. Define the surrogate regretRˆ′n=∑n
t=1〈`Bt−
`a∗n,ut〉. By the definition ofAtandBtand Lemma 37.14 (b) we haveEt− 1 [〈`At−
`Bt,ut〉]≤γ, whereEt− 1 [·] =E[·|A 1 ,B 1 ,...,At− 1 ,Bt− 1 ]. Furthermore,|〈`a−
`b,ut〉|≤1 for alla,b. Therefore, by Hoeffding–Azuma, with probability at least