Bandit Algorithms

(Jeff_L) #1
37.3 Classification of finite adversarial partial monitoring 456

of their cells is (d−3)-dimensional. Finally,N 3 ={ 1 , 2 , 3 }andN 1 ={ 1 , 3 }and
N 23 ={ 2 , 3 , 4 }. Think about how we decided on what losses to use to get the
cell decomposition shown in Fig. 37.2.

L=











0 1 1
1 0 1
1 /2 1/2 1/ 2
3 /4 1/4 3/ 4
1 1 /2 1/ 2
1 1 /4 3/ 4











Φ =











1 2 3
1 1 1
1 1 1
1 2 3
1 1 1
1 1 1











C 3

C 2

C 1
u 1

u 2

C 4

C 5

C 6

Figure 37.2Partial monitoring game withk= 6 andd= 3 andF= 3.

37.3 Classification of finite adversarial partial monitoring


The terminology in the last chapter finally allows us to state the main theorem
of this chapter that classifies finite adversarial partial monitoring games.
Theorem37.9.The minimax regret of partial monitoring problemG= (L,Φ)
fal ls into one of four categories:

R∗n(G) =














0 , ifGhas no pairs of neighboring actions;
Θ( ̃ √n), ifGis locally observable and has neighboring actions;
Θ(n^2 /^3 ), ifGis globally observable, but not local ly observable;
Ω(n), otherwise.

The Landau notation is used in the traditional mathematical sense and
obscures dependence onk,d,Fand the finer structure ofG= (L,Φ).

The proof is split into parts by proving upper and lower bounds for each part.
First up is the lower bounds. We then describe a policy for locally observable
games and analyze its regret. The upper bound for globally observable games is
left for you in Exercise 37.11.

37.4 Lower bounds


Like for bandits, the lower bounds are most easily proven using a stochastic
adversary. In stochastic partial monitoring we assume that u 1 ,...,un are
Free download pdf