Bandit Algorithms

37.2 The structure of partial monitoring 453

vectors (ut)nt=1byut=eit. No information has been lost in this transformation from (it) to (ut), so from now on we assume the adversary directly chooses (ut) and make no use of (it). Let`a∈Rdbe theath row of matrixL. The loss suffered in roundtwhen choosing actionaisyta=〈`a,ut〉. Let ̄ut=^1 t

∑t
s=1us∈Pd−^1 be the vector of mean frequencies of the adversary’s
choices overtrounds. An actionais optimal in hindsight ifmaxb〈a−b, ̄un〉= 0.
Thecellof an actionais subset ofPd− 1 on which it is optimal:

Ca=

{

u∈Pd− 1 : max b∈[k]

〈`a−`b,u〉≤ 0

}

,

which is a convex polytope. The collection{Ca :a∈[k]}is called thecell
decomposition. Actions withCa=∅are calleddominatedbecause they are
never optimal, no matter how the adversary plays. For nondominated actions
we define thedimensionof an action to be the dimension of theaffine hullof
Ca. Readers unfamiliar with the affine hull should read Note 4 at the end of the
chapter. A nondominated action is calledPareto optimalif it has dimension
d−1 anddegenerateotherwise. Actionsaandbareduplicatesifa=b.
Pareto optimal actionsaandbareneighborsifCa∩Cbhas dimensiond−2.
Note that ifaandbare Pareto optimal duplicates, thenCa∩Cbhas dimension
d−1 and the definition means thataandbare not neighbors. For Pareto optimal
actionawe letNabe the set consisting ofaand its neighbors. Given a pair of
neighbors (a,b) we letNab={c∈[k] :Ca∩Cb⊆Cc}, while for Pareto optimal
actionawe letNaa=∅.

Dominated and degenerate actions can never be uniquely optimal in hindsight, but their presence can make the difference between a hard game and a hopeless one. Ifc > 1 /2, then the third action in matching pennies is dominated, but without it the learner would suffer linear regret. Duplicate actions are only duplicate in the sense that they have the same loss. They may have different feedback structures and so cannot be trivially combined.

Letaandbbe neighboring actions. The next lemma characterizes actions inNabas eithera,b, duplicates ofa,bor degenerate actionscfor which`cis a convex combination of`aand`b. The situation is illustrated whend= 2 in Fig. 37.1.

Lemma37.7.Leta,bbe neighboring actions andc∈Nabbe an action such that `c∈{/ `a,`b}. Then

(a) There exists anα∈(0,1)such thatc=αa+ (1−α)`b.
(b)Cc=Ca∩Cb.
(c)chas dimensiond− 2.

Proof We use the fact that ifX ⊆ Y ⊆ Rdand dim(X) =dim(Y), then

Bandit Algorithms

{

}

,

Get our desktop app

Company

Features

Documentation

Resources