Bandit Algorithms

(Jeff_L) #1
20.2 Notes 245

And things have worked out beautifully. Since M ̄t is a nonnegative
supermartingale, the maximal inequality (Theorem 3.9) shows that

P

(


sup
t∈N

log(M ̄t)≥log

(


1


δ

))


=P


(


sup
t∈N

M ̄t≥^1
δ

)


≤δ.

The result follows by substituting Eq. (20.8) into the above display and
rearranging.

20.2 Notes


1 Recall from the previous chapter that when‖At‖ 2 ≤Lis assumed, then

detVt(λ)
λd


(


trace

(


Vt(λ)
λd

))d

(


1 +


nL^2
λd

)d

. (20.9)


In general the log determinant form should be preferred when confidence
intervals are used as part of an algorithm, but the right-hand side has a
concrete form that can be useful when stating regret bounds.
2 The bounds in the previous note can be quite tight, which suggests the results
provided by Theorem 20.5 suffer from a worse dependence on the dimension
than what was possible in the fixed design setting (Eq. (20.2)). You might
ask whether or not this is really necessary? In Exercise 20.2 you answer this
question in the affirmative. In particular, there exists a sequential design such
that

E

[


〈x,θˆt−θ∗〉^2 /‖x‖^2 Vt− 1

]


= Ω(d).

When the actions are chosen using a fixed design, however, integrating Eq. (20.2)
shows thatE[〈x,θˆt−θ∗〉^2 /‖x‖^2 V− 1
t

] =O(1).


3 Supermartingales arise naturally in proofs relying on the Cramer-Chernoff
method. Just one example is the proof of Lemma 12.2. One could rewrite
most of the proofs involving sums of random variables relying on the Cramer-
Chernoff method in a way that it would become clear that the proof hinges on
the supermartingale property of an appropriate sequence.

20.3 Bibliographic remarks


Bounds like those given in Theorem 20.5 are called self-normalized bounds [de la
Pe ̃na et al., 2008]. The method of mixtures goes back to the work by Robbins
and Siegmund [1970]. In practice, the improvement provided by the method of
mixtures relative to the covering arguments is quite large. A historical account
of martingale methods in sequential analysis is by Lai [2009]. A simple proof of
Lemma 20.1 appears as Lemma 2.5 in the book by van de Geer [2000]. Calculating
covering numbers (or related packing numbers) is a whole field by itself, with
Free download pdf