Bandit Algorithms

37.3 Classification of finite adversarial partial monitoring 456

of their cells is (d−3)-dimensional. Finally,N 3 ={ 1 , 2 , 3 }andN 1 ={ 1 , 3 }and N 23 ={ 2 , 3 , 4 }. Think about how we decided on what losses to use to get the cell decomposition shown in Fig. 37.2.

L=







0 1 1 1 0 1 1 /2 1/2 1/ 2 3 /4 1/4 3/ 4 1 1 /2 1/ 2 1 1 /4 3/ 4







Φ =







1 2 3 1 1 1 1 1 1 1 2 3 1 1 1 1 1 1







C 3

C 2

C 1 u 1

u 2

C 4

C 5

C 6

Figure 37.2Partial monitoring game withk= 6 andd= 3 andF= 3.

37.3 Classification of finite adversarial partial monitoring

The terminology in the last chapter finally allows us to state the main theorem of this chapter that classifies finite adversarial partial monitoring games. Theorem37.9.The minimax regret of partial monitoring problemG= (L,Φ) fal ls into one of four categories:

R∗n(G) =











0 , ifGhas no pairs of neighboring actions; Θ( ̃ √n), ifGis locally observable and has neighboring actions; Θ(n^2 /^3 ), ifGis globally observable, but not local ly observable; Ω(n), otherwise.

The Landau notation is used in the traditional mathematical sense and obscures dependence onk,d,Fand the finer structure ofG= (L,Φ).

The proof is split into parts by proving upper and lower bounds for each part. First up is the lower bounds. We then describe a policy for locally observable games and analyze its regret. The upper bound for globally observable games is left for you in Exercise 37.11.

37.4 Lower bounds

Like for bandits, the lower bounds are most easily proven using a stochastic adversary. In stochastic partial monitoring we assume that u 1 ,...,un are

Bandit Algorithms

































37.3 Classification of finite adversarial partial monitoring











37.4 Lower bounds

Get our desktop app

Company

Features

Documentation

Resources