Bandit Algorithms

37.10 Exercises 475

37.13(Lower bound depending on the number of feedbacks) Consider G= (L,Φ) given by

L=

(

1 0 1 0 ··· 1 0

0 1 0 1 ··· 0 1

)

and

Φ =

(

1 2 2 3 3 4 ··· F− 1 F− 1 F

1 1 2 2 3 3 ··· F− 2 F− 1 F− 1

)

.

(a) Show this game is locally observable.
(b)Prove there exists a universal constantc >0 such thatR∗n(G)≥c(F−1)

√

n.

The source for previous exercise is the paper by the authors [Lattimore and Szepesv ́ari, 2019].

37.14(Divergence decomposition for partial monitoring) Complete the necessary modification of Lemma 15.1 to show that Eq. (37.8) is true.

37.15(Algorithm for classifying games) Write a program that accepts as input matricesLand Φ and outputs the classification of the game.

37.16(Implementation) In this exercise you will investigate the empirical behavior of NeighborhoodWatch2 on matching pennies (Example 37.3) using a stochastic adversary.

(a) Implement NeighborhoodWatch2.
(b)Apply your algorithm to matching pennies for a variety of choices ofcand
stochastic adversariesu. Try to stress your algorithm as much as possible
(for eachcchoose the most challengingu).
(c) Plot your results from the previous part. Tell an interesting story.

Bandit Algorithms

(

1 0 1 0 ··· 1 0

0 1 0 1 ··· 0 1

)

Φ =

(

1 2 2 3 3 4 ··· F− 1 F− 1 F

1 1 2 2 3 3 ··· F− 2 F− 1 F− 1

)

.

√

Get our desktop app

Company

Features

Documentation

Resources