Bandit Algorithms

37.8 Notes 470

matrix multiplication,A(x) =Ax. TheimageofAisim(A) ={Ax:x∈Rn}
and thekernelisker(A) ={x∈Rn:Ax= 0}. Notice thatim(A)⊆Rmand
ker(A)⊆Rn. One can easily check thatim(A) andker(A>) are linear subspaces
and an elementary theorem in linear algebra says thatim(A)⊕ker(A>) =Rm
for any matrixA ∈Rm×n. Finally, ifu∈im(A) andv∈ker(A>), then
〈u,v〉= 0. There are probably hundreds of introductory texts on linear algebra.
A short and intuitive exposition is by Axler [1997].
4 Given a setA⊆Rdtheaffine hullis the set

aff(A) =

{ j ∑

i=1

αixj:j > 0 , α∈Rj, xi∈Afor alli∈[j] and

∑j

i=1

αi= 1

}

.

Its dimension is the smallestmsuch that there exist vectorsv 1 ,...,vm∈Rd
such that aff(A) =x◦+ span(v 1 ,...,vm) for anyx◦∈A.
5 We introduced the stochastic variant of partial monitoring to prove our lower
bounds. Of course our upper bounds also apply to this setting, which means the
classification theorem also holds in the stochastic case. The interesting question
is to understand the problem-dependent regret, which for partial monitoring
problemG= (L,Φ) is

Rn(π,u) = max a∈[k]

E

[n ∑

t=1

〈`At−`a,Ut〉

]

,

whereU,U 1 ,...,Unis a sequence of independent and identically distributed
random vectors withUt∈ {e 1 ,...,ed}andE[U] =u∈ Pd− 1. ProvidedGis
not hopeless one can derive an algorithm for which the regret is logarithmic,
and like in bandits there is a sense of asymptotic optimality. The open research
question is to understand the in-between regime where the horizon is not yet
large enough that the asymptotically optimal logarithmic regret guarantees
become meaningful, but not so small that minimax is acceptable.
6 In the proof of Lemma 37.17 we used the overpowered Jordan–Brouwer
separation theorem to guarantee that the line segment that connectsuwith the
centroidvofCahas a nonempty intersection with the boundary ofCa. Here,u
was a point that lied outside ofCa. The Jordan–Brouwer separation theorem
generalizes the Jordan curve theorem, which states that every simple closed
planar curve separates the plane into a bounded interior and an unbounded
exterior region so that the boundary of both regions is the said planar curve.
The Jordan–Brouwer theorem states that the same holds in higher dimensions
where the closed planar curve becomes a topological sphere, which is the image
of the unit sphere ofRdunder some continuous injective map from the sphere
intoRd. To use the theorem we view the simplexPd− 1 as a subset ofRd−^1 by
dropping the last entry in the coordinate representation of the points ofPd− 1.
Then the boundary ofCacan be seen as a topological sphere inRd−^1 andv
belongs to the interior, whileubelongs to the exterior region created by the
boundary ofCa. The line segment connectinguandvwill pass through the

Bandit Algorithms

}

.

E

]

,

Get our desktop app

Company

Features

Documentation

Resources