Bandit Algorithms

(Jeff_L) #1
37.6 Upper bound for easy games 465

twice that〈`a−`b,w〉= 0, we calculate

〈`a−`b,u〉=〈`a−`b,u−w〉=

‖u−w‖ 2
‖v−w‖ 2 〈`a−`b,w−v〉

=

‖u−w‖ 2
‖v−w‖ 2

〈`b−`a,v〉> 0 , (37.14)

where the second equality used thatw 6 =vis a point of the line segment connecting
vandu, hencew−vandu−ware parallel and share the same direction and
‖v−w‖ 2 >0 (see Fig. 37.4), and the last inequality follows becausevis the
centroid ofCaanda,bare distinct Pareto optimal actions.
Letvcbe the centroid ofCcfor anyc∈A. Then,


〈`a−` ̃a,u〉
〈`a−`b,u〉=

〈`a−`a ̃,w+u−w〉
〈`a−`b,u〉

(a)

〈`a−`b,w〉+〈`a−`a ̃,u−w〉
〈`a−`b,u〉
(b)
=

〈`a−` ̃a,u−w〉
〈`a−`b,u〉

(c)
=

‖v−w‖ 2 〈`a−` ̃a,u−w〉
‖u−w‖ 2 〈`b−`a,v〉
(d)

‖v−w‖ 2 ‖`a−` ̃a‖ 2
〈`b−`a,v〉

(e)


2 d
minc∈Amine∈Nc\{c}〈`e−`c,vc〉

=


1


εG

,


where (a) follows since by(37.14),〈a−b,u〉>0 and also becausew∈Cb
implies that〈a− ̃a,w〉 ≤ 〈a−b,w〉, (b) follows since〈a−b,w〉= 0,
which is used in other steps as well. (c) uses(37.14), (d) is by Cauchy-Schwarz
and in (e) we bounded‖w−v‖ 2 ≤



2 and used that‖`a−` ̃a‖ 2 ≤


dand
〈`b−`a,v〉=〈`b−`a,va〉 ≥minc∈Amine∈Nc\{c}〈`e−`c,vc〉>0. The final
equality serves as the definition of 1/εG.

Lemma37.18.LetHbe the set of functionsφ:A→Awithφ(a)∈Na∩Afor
alla∈ Aand definea∗n=argmina∈[k]


∑n
t=1〈`a,ut〉. Then for any sequence of
actions(Bt)nt=1inA,


∑n

t=1

〈`Bt−`a∗n,ut〉≤^1
εG

max
φ∈H

∑n

t=1

〈`Bt−`φ(Bt),ut〉.

Proof With no loss of generality, we can assume thata∗n∈ AbecauseAis a
maximal duplicate-free subset of Pareto optimal actions. Apply the previous
lemma on subsequences of rounds whereBt=afor eacha∈A.

Lemma37.19.Letδ∈(0,1). Then with probability at least 1 − 2 δit holds that

Rˆn≤γn+^1
εG


j∈A

max
b∈Nj∩A

∑n

t=1

P ̃tj


a∈A

Qtja(yta−ytb) +


8 nlog(|H|/δ).

Proof Fort∈[n], letBt∼P ̃t. Define the surrogate regretRˆ′n=

∑n
t=1〈`Bt−
`a∗n,ut〉. By the definition ofAtandBtand Lemma 37.14 (b) we haveEt− 1 [〈`At−
`Bt,ut〉]≤γ, whereEt− 1 [·] =E[·|A 1 ,B 1 ,...,At− 1 ,Bt− 1 ]. Furthermore,|〈`a−
`b,ut〉|≤1 for alla,b. Therefore, by Hoeffding–Azuma, with probability at least
Free download pdf