Bandit Algorithms

(Jeff_L) #1
37.4 Lower bounds 457

chosen independently at random from the same distribution. To emphasize
the randomness we switch to capital letters. Given a partial monitoring game
G= (L,Φ) and probability vectoru∈ Pd− 1 the stochastic partial monitoring
environment associated withusamples a sequence of independently and identically
distributed random variablesI 1 ,...,InwithP(It=i)=uiandUt=eIt. In each
roundta policy chooses actionAtand receives feedback Φt= ΦAtIt. The regret
is

Rn(π,u) = max
a∈[k]

E


[n

t=1

〈`At−`a,Ut〉

]


= max
a∈[k]

E


[n

t=1

〈`At−`a,u〉

]


.


The reader should check thatR∗n(G)≥minπmaxu∈Pd− 1 Rn(π,u), which allows
us to restrict our attention to stochastic partial monitoring problems. Given
u,q∈Pd− 1 , letD(u,q) be the relative entropy between categorical distributions
with parametersuandqrespectively:


D(u,q) =

∑k

i=1

uilog

(


ui
qi

)



∑k

i=1

(ui−qi)^2
qi

, (37.4)


where the second inequality follows from the fact that for measuresP,Qwe have
D(P,Q)≤χ^2 (P,Q) (see Note 5 in Chapter 13).


Theorem37.10. LetG= (L,Φ)be a globally observable partial monitoring
problem that is not locally observable. Then there exists a constantcG> 0 such
thatR∗n(G)≥cGn^2 /^3.


Proof The proof involves several steps. Roughly, we need to define two alternative
stochastic partial monitoring problems. We then show these environments are
hard to distinguish without playing an action associated with a large loss. Finally
we balance the cost of distinguishing the environments against the linear cost of
playing randomly.


Step 1: Defining the alternatives
Leta,bbe a pair neighboring actions that are not locally observable. Then by
definitionCa∩Cbis a polytope of dimensiond−2. Letube the centroid of
Ca∩Cband
ε= min
c/∈Nab

〈`c−`a,u〉. (37.5)

The value ofεis well defined, since by global observability ofG, but nonlocal
observability of (a,b) there must exist some actionc /∈Nab. Furthermore, since
c /∈Nabit follows thatε >0. As in the lower bound constructions for stochastic
bandits, we now define two stochastic partial monitoring problems. Since (a,b)
are not locally observable, there does not exist a functionv: [k]×[F]→Rsuch
that for alli∈[d],


c∈Nab

v(c,Φci) =`ai−`bi. (37.6)
Free download pdf