Bandit Algorithms

(Jeff_L) #1
20.1 Martingales and the method of mixtures 240

that

‖θˆt−θ∗‖Vt= max
x∈Sd−^1


Vt^1 /^2 x,θˆt−θ∗


= max
x∈Sd−^1

min
y∈Cε

[〈


Vt^1 /^2 (x−y),ˆθt−θ∗


+



Vt^1 /^2 y,θˆt−θ∗

〉]


< max
x∈Sd−^1
ymin∈C
ε

[


‖ˆθt−θ∗‖Vt‖x−y‖ 2 +


2 log

(


|Cε|
δ

)]


≤ε‖θˆt−θ∗‖Vt+


2 log

(


|Cε|
δ

)


.


Rearranging yields

‖ˆθt−θ∗‖Vt≤^1
1 −ε


2 log

(


|Cε|
δ

)


.


Now there is a tension in the choice ofε >0. The term in the denominator
suggests thatεshould be small, but by Lemma 20.1 the cardinality ofCεgrows
rapidly asεtends to zero. By lazily choosingε= 1/2,

P


(


‖θˆt−θ∗‖Vt≥ 2


2


(


dlog(6) + log

(


1


δ

)))


≤δ. (20.3)

Except for constants and other minor differences, this turns out to be about as
good as you can get. Unfortunately, however, this analysis only works because
Vtwas assumed to be deterministic. When the actions are chosen by a bandit
algorithm this assumption does not hold and the ideas need to be modified.

20.1 Martingales and the method of mixtures


We now remove the limiting assumptions in the previous section. Of course some
conditions are still required. For the remainder of this section it is assumed that
(At)∞t=1, (Xt)∞t=1and (ηt)∞t=1are sequences of random elements for which the
following hold:

1 There exists aθ∗∈Rdsuch thatXt=〈At,θ∗〉+ηtfor allt.
2 The noise is conditionally 1-subgaussian:

for allα∈Randt, E[exp(αηt)|Ft− 1 ]≤exp

(


α^2
2

)


a.s., (20.4)

whereFt− 1 =σ(A 1 ,X 1 ,...,At− 1 ,Xt− 1 ,At).
3 There is some regularization:λ >0.

The inclusion ofAtin the definition ofFt− 1 allows the noise to depend on past
choices, including the most recent action. This is often essential. For example, if
Free download pdf