Bandit Algorithms

(Jeff_L) #1
20.1 Martingales and the method of mixtures 244

Lemma20.3.Lethbe a probability measure onRd, thenM ̄t=


RdMt(x)dh(x)
is aF-adapted supermartingale withM ̄ 0 = 1.


The following theorem is the key result from which the confidence set will be
derived.

Theorem20.4.For al lλ > 0 andδ∈(0,1),


P

(


existst∈N:‖St‖^2 Vt(λ)− 1 ≥2 log

(


1


δ

)


+ log

(


det(Vt(λ))
λd

))


≤δ.

The proof will be given momentarily. First though, the implications.

Theorem20.5.Letδ∈(0,1). Then with probability at least 1 −δit holds that
for allt∈N,


‖θˆt−θ∗‖Vt(λ)<


λ‖θ∗‖+


2 log

(


1


δ

)


+ log

(


detVt(λ)
λd

)


.


Furthermore, if‖θ∗‖≤L, thenP(existst∈N+:θ∗∈C/ t)≤δwith


Ct=

{


θ∈Rd:‖θˆt− 1 −θ‖Vt− 1 (λ)≤


λL+


2 log

(


1


δ

)


+ log

(


detVt− 1 (λ)
λd

)}


.


Proof We only have to compare‖St‖Vt(λ)− 1 and‖θˆt−θ∗‖Vt(λ).

‖θˆt−θ∗‖Vt(λ)=‖Vt(λ)−^1 St+ (Vt(λ)−^1 Vt−I)θ∗‖Vt(λ)
≤‖St‖Vt(λ)−^1 + (θ>∗(Vt(λ)−^1 Vt−I)Vt(λ)(Vt(λ)−^1 Vt−I)θ∗)^1 /^2
=‖St‖Vt(λ)− 1 +λ^1 /^2 (θ∗>(I−Vt(λ)−^1 Vt)θ∗)^1 /^2
≤‖St‖Vt(λ)− 1 +λ^1 /^2 ‖θ∗‖.

And the result follows from Theorem 20.4.


Proof of Theorem 20.4 LetH=λI∈Rd×dandh=N(0,H−^1 ) and

M ̄t=


Rd

Mt(x)dh(x)

=


1



(2π)ddet(H−^1 )


Rd

exp

(


〈x,St〉−

1


2


‖x‖^2 Vt−

1


2


‖x‖^2 H

)


dx.

Completing the square,

〈x,St〉−

1


2 ‖x‖

2
Vt−

1


2 ‖x‖

2
H=

1


2 ‖St‖

2
(H+Vt)−^1 −

1


2 ‖x−(H+Vt)

− (^1) St‖ 2
H+Vt.
The first term‖St‖^2 (H+Vt)− 1 does not depend onxand can be moved outside the
integral, which leaves a quadratic ‘Gaussian’ term that may be integrated exactly
and results in
M ̄t=


(


det(H)
det(H+V)

) 1 / 2


exp

(


1


2


‖St‖^2 (H+Vt)− 1

)


. (20.8)

Free download pdf