Bandit Algorithms

20.2 Notes 245

And things have worked out beautifully. Since M ̄t is a nonnegative supermartingale, the maximal inequality (Theorem 3.9) shows that

P

(

sup t∈N

log(M ̄t)≥log

(

1

δ

))

=P

(

sup t∈N

M ̄t≥^1 δ

)

≤δ.

The result follows by substituting Eq. (20.8) into the above display and rearranging.

20.2 Notes

1 Recall from the previous chapter that when‖At‖ 2 ≤Lis assumed, then

detVt(λ) λd

≤

(

trace

(

Vt(λ) λd

))d ≤

(

1 +

nL^2 λd

)d

. (20.9)

In general the log determinant form should be preferred when confidence intervals are used as part of an algorithm, but the right-hand side has a concrete form that can be useful when stating regret bounds. 2 The bounds in the previous note can be quite tight, which suggests the results provided by Theorem 20.5 suffer from a worse dependence on the dimension than what was possible in the fixed design setting (Eq. (20.2)). You might ask whether or not this is really necessary? In Exercise 20.2 you answer this question in the affirmative. In particular, there exists a sequential design such that

E

[

〈x,θˆt−θ∗〉^2 /‖x‖^2 Vt− 1

]

= Ω(d).

When the actions are chosen using a fixed design, however, integrating Eq. (20.2) shows thatE[〈x,θˆt−θ∗〉^2 /‖x‖^2 V− 1 t

] =O(1).

3 Supermartingales arise naturally in proofs relying on the Cramer-Chernoff method. Just one example is the proof of Lemma 12.2. One could rewrite most of the proofs involving sums of random variables relying on the Cramer- Chernoff method in a way that it would become clear that the proof hinges on the supermartingale property of an appropriate sequence.

20.3 Bibliographic remarks

Bounds like those given in Theorem 20.5 are called self-normalized bounds [de la Pe ̃na et al., 2008]. The method of mixtures goes back to the work by Robbins and Siegmund [1970]. In practice, the improvement provided by the method of mixtures relative to the covering arguments is quite large. A historical account of martingale methods in sequential analysis is by Lai [2009]. A simple proof of Lemma 20.1 appears as Lemma 2.5 in the book by van de Geer [2000]. Calculating covering numbers (or related packing numbers) is a whole field by itself, with

Bandit Algorithms

(

(

1

))

=P

(

)

20.2 Notes

≤

(

(

(

1 +

[

]

] =O(1).

20.3 Bibliographic remarks

Get our desktop app

Company

Features

Documentation

Resources