Bandit Algorithms

18.5 Notes 220

Substituting into Eq. (18.11) leads to

Rn≤

log(M) η +

ηnK 2 =

√

2 nKlog(M).

Let us see how this theorem can be applied to the contextual bandit whereCis a finite set and Φ is the set of all functions fromC →[k]. To each of these functions φ∈Φ we associate an expertmwithEmi(t)=I{φ(ct) =i}. ThenM=kCand Theorem 18.1 says that Rn≤

√

2 nk|C|log(k), which is the same bound we derived using an independent copy of Exp3 for each context. More generally, ifCis arbitrary (possibly infinite) and Φ is a finite set of functions fromCto [k], then the theorem ensures that Rn≤

√

2 nklog(|Φ|).

These results seem quite promising already, but in fact there is another improvement possible. The intuition is that learning should be easier if the experts have a high degree of agreement. One way to measure this is by

E∗t=

∑t

s=1

∑k

i=1

max m∈[M]

E(mis).

In Exercise 18.7) you will show that if all experts make identical recommendations, thenE∗t=t. And that no matter how the experts behave, E∗n≤nmin(k,M). (18.13) In this senseE∗n/ncan be viewed as the effective number of experts, which depends on the degree of disagreement in the expert’s recommendations. By modifying the algorithm to use a time varying learning rate one can prove the following theorem. Theorem18.3. Assume the same conditions as in Theorem 18.1, except let ηt=

√

log(M)/Et∗. Then there exists a universal constantC > 0 such that Rn≤C

√

En∗log(M). (18.14) The proof of Theorem 18.3 is not hard and is left to Exercise 18.4. The bound tells us that Exp4 with the suggested learning rate is able to adapt to degree of disagreement between the experts, which seems like quite an encouraging result. As a further benefit, the learning rate does not depend on the horizon so the algorithm is anytime.

18.5 Notes

1 The most important concept in this chapter is that there are tradeoffs when choosing the competitor class. A large class leads to a more meaningful definition

Bandit Algorithms

√

√

√

√

√

18.5 Notes

Get our desktop app

Company

Features

Documentation

Resources