Bandit Algorithms

20.1 Martingales and the method of mixtures 244

Lemma20.3.Lethbe a probability measure onRd, thenM ̄t=

∫

RdMt(x)dh(x)
is aF-adapted supermartingale withM ̄ 0 = 1.

The following theorem is the key result from which the confidence set will be derived.

Theorem20.4.For al lλ > 0 andδ∈(0,1),

P

(

existst∈N:‖St‖^2 Vt(λ)− 1 ≥2 log

(

1

δ

)

+ log

(

det(Vt(λ)) λd

))

≤δ.

The proof will be given momentarily. First though, the implications.

Theorem20.5.Letδ∈(0,1). Then with probability at least 1 −δit holds that
for allt∈N,

‖θˆt−θ∗‖Vt(λ)<

√

λ‖θ∗‖+

√

2 log

(

1

δ

)

+ log

(

detVt(λ) λd

)

.

Furthermore, if‖θ∗‖≤L, thenP(existst∈N+:θ∗∈C/ t)≤δwith

Ct=

{

θ∈Rd:‖θˆt− 1 −θ‖Vt− 1 (λ)≤

√

λL+

√

2 log

(

1

δ

)

+ log

(

detVt− 1 (λ) λd

)}

.

Proof We only have to compare‖St‖Vt(λ)− 1 and‖θˆt−θ∗‖Vt(λ).

‖θˆt−θ∗‖Vt(λ)=‖Vt(λ)−^1 St+ (Vt(λ)−^1 Vt−I)θ∗‖Vt(λ) ≤‖St‖Vt(λ)−^1 + (θ>∗(Vt(λ)−^1 Vt−I)Vt(λ)(Vt(λ)−^1 Vt−I)θ∗)^1 /^2 =‖St‖Vt(λ)− 1 +λ^1 /^2 (θ∗>(I−Vt(λ)−^1 Vt)θ∗)^1 /^2 ≤‖St‖Vt(λ)− 1 +λ^1 /^2 ‖θ∗‖.

And the result follows from Theorem 20.4.

Proof of Theorem 20.4 LetH=λI∈Rd×dandh=N(0,H−^1 ) and

M ̄t=

∫

Rd

Mt(x)dh(x)

=

1

√

(2π)ddet(H−^1 )

∫

Rd

exp

(

〈x,St〉−

1

2

‖x‖^2 Vt−

1

2

‖x‖^2 H

)

dx.

Completing the square,

〈x,St〉−

1

2 ‖x‖

2 Vt−

1

2 ‖x‖

2 H=

1

2 ‖St‖

2 (H+Vt)−^1 −

1

2 ‖x−(H+Vt)

− (^1) St‖ 2
H+Vt.
The first term‖St‖^2 (H+Vt)− 1 does not depend onxand can be moved outside the
integral, which leaves a quadratic ‘Gaussian’ term that may be integrated exactly
and results in
M ̄t=

(

det(H) det(H+V)

) 1 / 2

exp

(

1

2

‖St‖^2 (H+Vt)− 1

)

. (20.8)

Bandit Algorithms

∫

(

(

1

)

(

))

√

√

(

1

)

(

)

.

{

√

√

(

1

)

(

)}

.

∫

=

1

√

∫

(

1

2

1

2

)

1

1

1

1

(

) 1 / 2

(

1

2

)

. (20.8)

Get our desktop app

Company

Features

Documentation

Resources