Bandit Algorithms

37.5 Policy for easy games 460

C 1

u C 2

u 1

u 2 C 3

Figure 37.3Lower bound construction for hard partial monitoring problems

Theorem 37.11. IfGis not globally observable and has at least two non- dominated actions, then there exists a constantcG> 0 such thatR∗n(G)≥cGn.

Proof sketch SinceGis not globally observable there exists a pair of neighboring actions (a,b) that are not globally observable. Letube the centroid ofCa∩Cb. LetS∈RkF×dbe the stack of matrices from{Sc:c∈[k]}(all actions). Then using the same argument as the previous proof we have`a−`b∈/im(S>). Now defineq∈Rdsuch that〈`a−`b,q〉= 1 andSq= 0. Let ∆>0 be sufficiently small andua=u−∆qandub=u+∆q. Show thatD(Pua,Pub) = 0 for all policies and complete the proof in the same fashion as the proof of Theorem 37.10.

Theorem37.12.LetG= (L,Φ)be local ly observable and have at least one pair of neighbours. Then there exists a constantcG> 0 such that for al l large enough nthe minimax regret satisfiesR∗n(G)≥cG

√

n.

Proof sketch By assumption there exists a pair of neighbouring actions (a,b). Defineuas the centroid ofCa∩Cband letq= (`a−`b)/‖`a−`b‖^2. For sufficiently small ∆>0 letua=u−∆qandub=u+ ∆q. Then

D(Pua,Pub)≤n

∑d

i=1

(uai−ubi)^2 ubi

≤CGn∆^2 ,

whereCG>0 is a game-dependent constant. Let ∆ = 1/

√

nand apply the ideas in the proof of Theorem 37.10.

37.5 Policy for easy games

In this section we describe a policy for locally observable games called NeighborhoodWatch2. Fix a locally observable gameG= (L,Φ) with at least one pair of neighboring actions. In every round the policy always chooses

Bandit Algorithms

√

√

37.5 Policy for easy games

Get our desktop app

Company

Features

Documentation

Resources