Bandit Algorithms

(Jeff_L) #1
18.2 Bandits with expert advice 217

Adversary secretly chooses rewardsx∈[0,1]n×k
Experts secretly choose predictionsE(1),...,E(n)
For roundst= 1, 2 ,...,n:
Learner observes predictions of all experts,E(t)∈[0,1]M×k.
Learner selects a distributionPt∈Pk− 1.
ActionAtis sampled fromPtand the reward isXt=xtAt.

Figure 18.3Interaction protocol for bandits with expert advice.

the internal structure of Φ. In fact, once Φ has been chosen, the contexts play
very little role. All we need in each round is the output of each function.

18.2.1 Bandits with expert advice framework


Thebandits with expert advicesetting is ak-armed adversarial bandit, but
withMexperts making recommendations to the learner. At the beginning of each
round the experts announce their predictions about which actions are the most
promising. For the sake of generality, the experts report a probability distribution
over the actions. The interpretation is that the expert, if the decision were left
to them, would choose the action for the round at random from the probability
distribution it reported. As discussed before, in an adversarial setting it is natural
to consider randomized algorithms, hence one should not be too surprised that the
experts are also allowed to randomize. An application to an important practical
problem is illustrated in Fig. 18.2.
The predictions of theMexperts in roundtare represented by a matrix
E(t) ∈ [0,1]M×k where the mth rowEm(t) is a probability vector over [m]
representing the recommendations of expertmin roundt. SinceEm(t)is a row-
vector, for ak-dimensional vectorx, the expressionE(mt)xtis well-defined. The
learner and the environment interact according to the protocol in Fig. 18.3.
The regret measures the cumulative rewards collected by the learner relative
to the best expert in hindsight.

Rn=E

[


max
m∈[M]

∑n

t=1

E(mt)xt−

∑n

t=1

Xt

]


. (18.6)


This framework assumes the experts are oblivious in the sense that their
predictions do not depend on the actions of the learner.
Free download pdf