Bandit Algorithms

37.6 Upper bound for easy games 465

twice that〈`a−`b,w〉= 0, we calculate

〈`a−`b,u〉=〈`a−`b,u−w〉=

‖u−w‖ 2 ‖v−w‖ 2 〈`a−`b,w−v〉

=

‖u−w‖ 2 ‖v−w‖ 2

〈`b−`a,v〉> 0 , (37.14)

where the second equality used thatw 6 =vis a point of the line segment connecting
vandu, hencew−vandu−ware parallel and share the same direction and
‖v−w‖ 2 >0 (see Fig. 37.4), and the last inequality follows becausevis the
centroid ofCaanda,bare distinct Pareto optimal actions.
Letvcbe the centroid ofCcfor anyc∈A. Then,

〈`a−` ̃a,u〉〈`a−`b,u〉=

〈à−à ̃,w+u−w〉〈à−`b,u〉

(a) ≤

〈à−`b,w〉+〈à−à ̃,u−w〉〈à−`b,u〉 (b) =

〈`a−` ̃a,u−w〉〈`a−`b,u〉

(c) =

‖v−w‖ 2 〈`a−` ̃a,u−w〉 ‖u−w‖ 2 〈`b−`a,v〉 (d) ≤

‖v−w‖ 2 ‖`a−` ̃a‖ 2 〈`b−`a,v〉

(e) ≤

√

2 d minc∈Amine∈Nc\{c}〈`e−`c,vc〉

=

1

εG

,

where (a) follows since by(37.14),〈a−b,u〉>0 and also becausew∈Cb
implies that〈a− ̃a,w〉 ≤ 〈a−b,w〉, (b) follows since〈a−b,w〉= 0,
which is used in other steps as well. (c) uses(37.14), (d) is by Cauchy-Schwarz
and in (e) we bounded‖w−v‖ 2 ≤

√

2 and used that‖`a−` ̃a‖ 2 ≤

√

dand 〈`b−à,v〉=〈`b−à,va〉 ≥minc∈Amine∈Nc\{c}〈è−`c,vc〉>0. The final equality serves as the definition of 1/εG.

Lemma37.18.LetHbe the set of functionsφ:A→Awithφ(a)∈Na∩Afor
alla∈ Aand definea∗n=argmina∈[k]

∑n
t=1〈`a,ut〉. Then for any sequence of
actions(Bt)nt=1inA,

∑n

t=1

〈`Bt−`a∗n,ut〉≤^1 εG

max φ∈H

∑n

t=1

〈`Bt−`φ(Bt),ut〉.

Proof With no loss of generality, we can assume thata∗n∈ AbecauseAis a maximal duplicate-free subset of Pareto optimal actions. Apply the previous lemma on subsequences of rounds whereBt=afor eacha∈A.

Lemma37.19.Letδ∈(0,1). Then with probability at least 1 − 2 δit holds that

Rˆn≤γn+^1 εG

∑

j∈A

max b∈Nj∩A

∑n

t=1

P ̃tj

∑

a∈A

Qtja(yta−ytb) +

√

8 nlog(|H|/δ).

Proof Fort∈[n], letBt∼P ̃t. Define the surrogate regretRˆ′n=

∑n t=1〈`Bt− à∗n,ut〉. By the definition ofAtandBtand Lemma 37.14 (b) we haveEt− 1 [〈Àt− `Bt,ut〉]≤γ, whereEt− 1 [·] =E[·|A 1 ,B 1 ,...,At− 1 ,Bt− 1 ]. Furthermore,|〈à− `b,ut〉|≤1 for alla,b. Therefore, by Hoeffding–Azuma, with probability at least

Bandit Algorithms

√

=

1

,

√

√

∑

∑

√

Get our desktop app

Company

Features

Documentation

Resources