Bandit Algorithms

(Jeff_L) #1
28.5 Notes 319

Proof Leta∗= argmina∈A

∑n
t=1〈a,yt〉be the optimal action. Then

Rn=E

[n

t=1

〈At−ra∗,yt〉

]


+


∑n

t=1

〈ra∗−a∗,yt〉≤Rn(ra∗) + (1−r)n,

where the inequality follows from the definition ofAand Cauchy-Schwarz. By
Theorem 28.5 and Theorem 26.12,


Rn(a∗)≤

diamF(A)
η

+


η
2

E


[n

t=1

‖Yˆt‖^2 ∇F(Zt)− 1

]


,


whereZt∈[A ̄t,A ̄t+1] lies on the chord connectingA ̄tandA ̄t+1. The algorithm is
stable in the sense that no matter how the losses are chosen,A ̄t+1 cannot
be too far fromA ̄t. This also means thatZt is close toA ̄t. By definition,
η‖Yˆt‖≤ηd/(1−r) = 1/2. Combining this with Eq. (28.13) shows that
1 −‖Zt‖
1 −‖A ̄t‖


≤ sup
α∈[0,1]

1 −‖αA ̄t+ (1−α)A ̄t+1‖
1 −‖A ̄t‖

= max

{


1 ,


1 −‖A ̄t+1‖
1 −‖A ̄t‖

}


≤max

{


1 ,


1 +η‖Lˆt− 1 ‖
1 +η‖Lˆt‖

}


≤max

{


1 ,


1 +η‖Lˆt− 1 ‖
1 /2 +η‖Lˆt− 1 ‖

}


≤ 2.


The next step is to find the Hessian ofF, which is


∇^2 F(a) =

I


1 −‖a‖

+


aa>
‖a‖(1−‖a‖)^2


I


1 −‖a‖

Therefore∇^2 F(a)−^1 ≤(1−‖a‖)Iand so


E

[


‖Yˆt‖^2 ∇F(Zt)− 1

]


≤E


[


(1−‖Zt‖)‖Yˆt‖^2

]


=d^2 E

[


(1−‖Zt‖)Et〈At,yt〉^2
(1−‖A ̄t‖)^2

]


≤ 2 d.

The diameter is bounded by diamF(A ̄)≤log(1/(1−r)) and hence


Rn≤(1−r)n+^1
η

log

(


1


1 −r

)


+ηnd


1


η

log

(


1


2 ηd

)


+ 3ηnd

≤ 3


2 ndlog(n).

28.5 Notes


1 Findingat+1for both mirror descent and follow the regularized leader requires
solving a convex optimization problem. Provided the dimension is not too
large and the action set and potential are reasonably nice, there exist practical
approximation algorithms for this problem. The two-step process described in
Eqs. (28.7) and (28.8) is sometimes an easier way to go. Usually(28.7)can
be solved analytically while(28.8)can be quite expensive. In some important
Free download pdf