Bandit Algorithms

37.6 Upper bound for easy games 467

∑

b∈NajP ̃ tj^2 /Ptb≤η^2 k^2 V^2 /γ=ηV ≤1, where in the last inequality we used the definition ofηand assumed thatn≥log(k/δ). To make sure that the regret bound holds even for smaller values ofn, we require CG≥k

√

log(ek)so that whenn < k^2 log(k/δ), the regret bound is trivial. For assumption(c), we have

Et− 1 [Zˆtja^2 ] =Et− 1

[(

P ̃tjvaj(At,Φt) PtAt

) 2 ]

≤V^2 P ̃tj^2 Et− 1

[

INaj(At) PtA^2 t

]

=V^2

∑

b∈Naj

P ̃tj^2 Ptb

=

βtja η

.

Finally(d)is satisfied by the definition ofvajand the fact thatPt∈ri(Pk− 1 ).
The result of Exercise 12.2 shows that with probability at least 1−(k+ 1)δ,

Rˆnj≤3 log(1/δ) η

+ 5

∑n

t=1

∑

a∈Nj∩A

Qtjaβtja+η

∑n

t=1

∑

a∈Nj∩A

QtjaZˆtja^2.

Step 2: Aggregating the local regret Using the result from the previous step in combination with a union bound over j∈Awe have that with probability at least 1−k(k+ 1)δ, ∑

j∈A

Rˆnj≤^3 klog(1/δ) η

+ 5

∑n

t=1

∑

j∈A

∑

a∈Nj∩A

Qtjaβtja

+η

∑n

t=1

∑

j∈A

∑

a∈Na∩A

QtjaZˆtja^2. (37.18)

the second term is bounded using the definition ofβtjain Eq. (37.12). ∑

a∈Nj∩A

Qtjaβtja=ηV^2

∑

a∈Nj∩A

Qtja

∑

b∈Naj

P ̃tj^2 Ptb

=ηV^2 P ̃tj

∑

a∈Nj∩A

Qtja

∑

b∈Naj

P ̃tj Ptb

.

Next split the sum that runs overb∈Najinto two, separating duplicates ofj and the rest: ∑

a∈Nj∩A

Qtja

∑

b∈Naj

P ̃tj Ptb

=

∑

a∈Nj∩A

Qtja

∑

b:`b=`j

P ̃tj Ptb

+

∑

a∈Nj∩A

Qtja

∑

b∈Naj:`b 6 =`j

P ̃tj Ptb

=

∑

b:`b=`j

P ̃tj Ptb

+

∑

a∈Nj∩A

∑

b∈Naj:`b 6 =`j

QtjaP ̃tj Ptb

≤ 4 k





∑

b:`b=`j

1 +

∑

a∈Nj∩A

∑

b∈Naj:`b 6 =`j

1



≤ 4 k^2 , (37.19)

Bandit Algorithms

∑

√

[(

) 2 ]

[

]

=V^2

∑

=

.

+ 5

∑

∑

+ 5

∑

∑

∑

∑

∑

∑

∑

∑

.

∑

=

∑

∑

+

∑

∑

=

∑

+

∑

∑





∑

1 +

∑

∑

1



Get our desktop app

Company

Features

Documentation

Resources