Bandit Algorithms

(Jeff_L) #1
37.5 Policy for easy games 460

C 1

u C 2

u 1

u 2
C 3

Figure 37.3Lower bound construction for hard partial monitoring problems

Theorem 37.11. IfGis not globally observable and has at least two non-
dominated actions, then there exists a constantcG> 0 such thatR∗n(G)≥cGn.

Proof sketch SinceGis not globally observable there exists a pair of neighboring
actions (a,b) that are not globally observable. Letube the centroid ofCa∩Cb.
LetS∈RkF×dbe the stack of matrices from{Sc:c∈[k]}(all actions). Then
using the same argument as the previous proof we have`a−`b∈/im(S>). Now
defineq∈Rdsuch that〈`a−`b,q〉= 1 andSq= 0. Let ∆>0 be sufficiently
small andua=u−∆qandub=u+∆q. Show thatD(Pua,Pub) = 0 for all policies
and complete the proof in the same fashion as the proof of Theorem 37.10.

Theorem37.12.LetG= (L,Φ)be local ly observable and have at least one pair
of neighbours. Then there exists a constantcG> 0 such that for al l large enough
nthe minimax regret satisfiesR∗n(G)≥cG


n.

Proof sketch By assumption there exists a pair of neighbouring actions (a,b).
Defineuas the centroid ofCa∩Cband letq= (`a−`b)/‖`a−`b‖^2. For sufficiently
small ∆>0 letua=u−∆qandub=u+ ∆q. Then

D(Pua,Pub)≤n

∑d

i=1

(uai−ubi)^2
ubi

≤CGn∆^2 ,

whereCG>0 is a game-dependent constant. Let ∆ = 1/


nand apply the ideas
in the proof of Theorem 37.10.

37.5 Policy for easy games


In this section we describe a policy for locally observable games called
NeighborhoodWatch2. Fix a locally observable gameG= (L,Φ) with at least
one pair of neighboring actions. In every round the policy always chooses
Free download pdf