Bandit Algorithms

28.5 Notes 319

Proof Leta∗= argmina∈A

∑n t=1〈a,yt〉be the optimal action. Then

Rn=E

[n ∑

t=1

〈At−ra∗,yt〉

]

+

∑n

t=1

〈ra∗−a∗,yt〉≤Rn(ra∗) + (1−r)n,

where the inequality follows from the definition ofAand Cauchy-Schwarz. By
Theorem 28.5 and Theorem 26.12,

Rn(a∗)≤

diamF(A) η

+

η 2

E

[n ∑

t=1

‖Yˆt‖^2 ∇F(Zt)− 1

]

,

whereZt∈[A ̄t,A ̄t+1] lies on the chord connectingA ̄tandA ̄t+1. The algorithm is
stable in the sense that no matter how the losses are chosen,A ̄t+1 cannot
be too far fromA ̄t. This also means thatZt is close toA ̄t. By definition,
η‖Yˆt‖≤ηd/(1−r) = 1/2. Combining this with Eq. (28.13) shows that
1 −‖Zt‖
1 −‖A ̄t‖

≤ sup α∈[0,1]

1 −‖αA ̄t+ (1−α)A ̄t+1‖ 1 −‖A ̄t‖

= max

{

1 ,

1 −‖A ̄t+1‖ 1 −‖A ̄t‖

}

≤max

{

1 ,

1 +η‖Lˆt− 1 ‖ 1 +η‖Lˆt‖

}

≤max

{

1 ,

1 +η‖Lˆt− 1 ‖ 1 /2 +η‖Lˆt− 1 ‖

}

≤ 2.

The next step is to find the Hessian ofF, which is

∇^2 F(a) =

I

1 −‖a‖

+

aa> ‖a‖(1−‖a‖)^2

≥

I

1 −‖a‖

Therefore∇^2 F(a)−^1 ≤(1−‖a‖)Iand so

E

[

‖Yˆt‖^2 ∇F(Zt)− 1

]

≤E

[

(1−‖Zt‖)‖Yˆt‖^2

]

=d^2 E

[

(1−‖Zt‖)Et〈At,yt〉^2 (1−‖A ̄t‖)^2

]

≤ 2 d.

The diameter is bounded by diamF(A ̄)≤log(1/(1−r)) and hence

Rn≤(1−r)n+^1 η

log

(

1

1 −r

)

+ηnd

≤

1

η

log

(

1

2 ηd

)

+ 3ηnd

≤ 3

√

2 ndlog(n).

28.5 Notes

1 Findingat+1for both mirror descent and follow the regularized leader requires solving a convex optimization problem. Provided the dimension is not too large and the action set and potential are reasonably nice, there exist practical approximation algorithms for this problem. The two-step process described in Eqs. (28.7) and (28.8) is sometimes an easier way to go. Usually(28.7)can be solved analytically while(28.8)can be quite expensive. In some important

Bandit Algorithms

]

+

+

E

]

,

{

1 ,

}

{

1 ,

}

{

1 ,

}

≤ 2.

I

+

≥

I

[

]

≤E

[

]

[

]

(

1

)

≤

1

(

1

)

√

28.5 Notes

Get our desktop app

Company

Features

Documentation

Resources