Bandit Algorithms

(Jeff_L) #1
This material will be published by Cambridge University Press as Bandit
Algorithms by Tor Lattimore and Csaba Szepesvari. This pre-publication version
is free to view and download for personal use only. Not for re-distribution, sale,
or use in derivative works.©Tor Lattimore and Csaba Szepesvari 2017.
The latest version is available athttp://banditalgs.com. Feedback on any
aspect is very welcome: [email protected]

30 Combinatorial Bandits


A combinatorial bandit is a linear bandit with a special kind of combinatorial
action setA⊂{ 0 , 1 }d, which for constantm∈[d] satisfies

A⊆

{


a∈{ 0 , 1 }d:‖a‖ 1 ≤m

}


The setting is studied in both adversarial and stochastic models. We focus on the
former in this chapter and discuss the latter in the notes. As usual the adversary
chooses a sequence of loss vectorsy 1 ,...,ynwithyt∈Rdand the expected regret
is

Rn=E

[


max
a∈A

∑n

t=1

〈At−a,yt〉

]


In Chapters 27 and 28 we assumed thatyt∈ {y:supa∈A|〈a,y〉| ≤ 1 }. This
restriction is not consistent with the applications we have in mind, so instead we
assume thatyt∈[0,1]d, which by the definition ofAensures that|〈At,yt〉|≤m
for allt. In the standard bandit model the learner observes〈At,yt〉in each
round. In many applications for which combinatorial bandits are applied there
is actually more information available. The simplest is thefull information
setting where the learner observes the whole vectoryt. The full information setup
is interesting, but does not have the flavor of a bandit problem and so we do
not discuss it further. There is an inbetween setting where the learner receives
semibanditfeedback, which is the vector (At 1 yt 1 ,...,Atdytd). SinceAti∈{ 0 , 1 }
this is equivalent to observingytifor thoseiwhenAti= 1.

30.1 Applications


Shortest path problems
LetG= (V,E) be a fixed graph with a finite set of verticesVand edgesE⊆V×V
with|E|=d. The online shortest path problem is a game overnrounds between
an adversary and a learner. Given fixed verticesu,v∈V, the learner’s objective
in each round is to find the shortest path betweenuandv. At the beginning of the
game the adversary chooses a sequence of vectorsy 1 ,...,ynwithytirepresenting
the length of theith edge inEin roundt. In each round the learner chooses a
path betweenuandv. The regret of the learner is the difference between the
Free download pdf