Bandit Algorithms

28.2 Regret analysis 312 which may be a different choice than mirror descent. Example28.3.The exponential weights algorithm that ...

28.2 Regret analysis 313 parts, the first of which is a little stronger than the second. To minimize clutter we abbreviateDFbyD. ...

28.2 Regret analysis 314 fora 1 = argminb∈AF(b). To see the second part note that 〈at−at+1,yt〉= 1 η〈at−at+1,∇F(at)−∇F( ̃at+1)〉 = ...

28.2 Regret analysis 315 for allt. Then mirror descent with the unnormalized negentropy potential and η= √ 2 log(d)/nsatisfiesRn ...

28.3 Online learning for bandits 316 Both Theorem 28.4 and Theorem 28.5 were presented for the oblivious case where (yt)nt=1are ...

28.4 Linear bandits on the unit ball 317 and assuming thatηYˆt+∇F(a)∈∇F(dom(F))for al la∈Aalmost surely, then the regret of the ...

28.4 Linear bandits on the unit ball 318 Surprisingly, follow the regularized leader with a carefully chosen potential improves ...

28.5 Notes 319 Proof Leta∗= argmina∈A ∑n t=1〈a,yt〉be the optimal action. Then Rn=E [n ∑ t=1 〈At−ra∗,yt〉 ] + ∑n t=1 〈ra∗−a∗,yt〉≤R ...

28.5 Notes 320 special cases, however, the projection step can be written in closed form or efficiently approximated. 2 We saw t ...

28.5 Notes 321 one-or-other can sometimes be slightly easier. Note thatlazy mirror descent is a variant of mirror descent that i ...

28.6 Bibliographic remarks 322 11 There is a nice application of online linear optimization to minimax theorems. LetXandY be arb ...

28.7 Exercises 323 depth (but is also ten years older). As far as we know, the first application of mirror descent to bandits wa ...

28.7 Exercises 324 (b)What happens if you replace mirror descent by follow the regularized leader? Pt+1= argminp∈A ∑t s=1 〈p,Yˆs ...

28.7 Exercises 325 (b) Show that ifFt=F/ηtand (ηt)nt=1+1is decreasing withηn=ηn+1, then Rn(a)≤F(a)−minb∈AF(b) ηn + ∑n t=1 ( 〈at− ...

28.7 Exercises 326 (b) Prove that any algorithm satisfying Eq. (28.14) also satisfies Rn≤k+C √√ √√ k ( 1 + mina∈[k] ∑n t=1 yta ) ...

28.7 Exercises 327 28.14(Minimax theorem) In this exercise you will prove simplifications of Sion’s minimax theorem. (a)Use the ...

This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This p ...

29.2 Reducing stochastic linear bandits to adversarial linear bandits 329 regret for the two cases are defined as follows: Rn= ∑ ...

29.3 Stochastic linear bandits with parameter noise 330 Suppose the adversarial policy guarantees a boundBnon the expected regre ...

29.3 Stochastic linear bandits with parameter noise 331 Combining the stochastic linear bandits with parameter noise model with ...

«
12
13
14
15
16
17
18
19
20
21
»

Free download pdf

Get our desktop app

Company

Features

Documentation

Resources