Bandit Algorithms

(Jeff_L) #1
37.6 Upper bound for easy games 467

By part (d) of Lemma 37.14 we havePtb≥γ/kfor alltandb∈[k]. Furthermore,
|vaj(At,Φt)|≤Vso thatη|Zˆtja|=|ηP ̃tjvaj(At,Φt)/PtAt|≤ηV k/γ= 1, where
the equality follows from the choice ofγ. Assumption(b)is satisfied in a similar
way withηβtja=η^2 V^2



b∈NajP ̃
tj^2 /Ptb≤η^2 k^2 V^2 /γ=ηV ≤1, where in the
last inequality we used the definition ofηand assumed thatn≥log(k/δ). To
make sure that the regret bound holds even for smaller values ofn, we require
CG≥k


log(ek)so that whenn < k^2 log(k/δ), the regret bound is trivial. For
assumption(c), we have

Et− 1 [Zˆtja^2 ] =Et− 1

[(


P ̃tjvaj(At,Φt)
PtAt

) 2 ]


≤V^2 P ̃tj^2 Et− 1

[


INaj(At)
PtA^2 t

]


=V^2



b∈Naj

P ̃tj^2
Ptb

=


βtja
η

.


Finally(d)is satisfied by the definition ofvajand the fact thatPt∈ri(Pk− 1 ).
The result of Exercise 12.2 shows that with probability at least 1−(k+ 1)δ,


Rˆnj≤3 log(1/δ)
η

+ 5


∑n

t=1


a∈Nj∩A

Qtjaβtja+η

∑n

t=1


a∈Nj∩A

QtjaZˆtja^2.

Step 2: Aggregating the local regret
Using the result from the previous step in combination with a union bound over
j∈Awe have that with probability at least 1−k(k+ 1)δ,

j∈A

Rˆnj≤^3 klog(1/δ)
η

+ 5


∑n

t=1


j∈A


a∈Nj∩A

Qtjaβtja


∑n

t=1


j∈A


a∈Na∩A

QtjaZˆtja^2. (37.18)

the second term is bounded using the definition ofβtjain Eq. (37.12).

a∈Nj∩A

Qtjaβtja=ηV^2


a∈Nj∩A

Qtja


b∈Naj

P ̃tj^2
Ptb

=ηV^2 P ̃tj


a∈Nj∩A

Qtja


b∈Naj

P ̃tj
Ptb

.


Next split the sum that runs overb∈Najinto two, separating duplicates ofj
and the rest:

a∈Nj∩A

Qtja


b∈Naj

P ̃tj
Ptb

=



a∈Nj∩A

Qtja


b:`b=`j

P ̃tj
Ptb

+



a∈Nj∩A

Qtja


b∈Naj:`b 6 =`j

P ̃tj
Ptb

=



b:`b=`j

P ̃tj
Ptb

+



a∈Nj∩A


b∈Naj:`b 6 =`j

QtjaP ̃tj
Ptb

≤ 4 k




b:`b=`j

1 +



a∈Nj∩A


b∈Naj:`b 6 =`j

1



≤ 4 k^2 , (37.19)
Free download pdf