Bandit Algorithms

24.1 Hypercube 272 24.1 Hypercube The first lower bound is for the hypercube action set and shows that the upper bounds in Chapt ...

24.2 Unit ball 273 ofθis at least Rn(A,θ) =Eθ [n ∑ t=1 ∑d i=1 (sign(θi)−Ati)θi ] ≥ √ 1 n ∑d i=1 Eθ [n ∑ t=1 I{sign(Ati) 6 = sign ...

24.2 Unit ball 274 ∑t s=1A (^2) si≥n/d}. Then, Rn(A,θ) = ∆Eθ [n ∑ t=1 ∑d i=1 ( √^1 d −Atisign(θi) )] ≥ ∆ √ d 2 Eθ [n ∑ t=1 ∑d i= ...

24.3 Sparse parameter vectors 275 The proof is completed using the randomization hammer: ∑ θ∈{±∆}d Rn(A,θ)≥ ∆ √ d 2 ∑d i=1 ∑ θ∈{ ...

24.4 Unrealizable case 276 Next define matrixV∈Rp×dto be a block-diagonal matrix with 1×kblocks, each containing the row vector ...

24.4 Unrealizable case 277 Then letε=supa∈A|〈θ,a〉−μ(a)|be the maximum error. It would be very pleasant to have an algorithm such ...

24.5 Notes 278 By choosingA={T 1 (n)≤n/ 2 }we have Rn(μ) +Rn(μ′)≥ n∆ 4 exp(−2) = n(k−1) 4 V exp(−2). Therefore by the assumption ...

24.7 Exercises 279 of one of the authors on the Pareto-regret frontier for bandits, which characterizes what tradeoffs are avail ...

This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This p ...

25.1 An asymptotic lower bound for fixed action sets 281 lim infn→∞λmin(G ̄n)/log(n)> 0. Furthermore, for anya∈Ait holds that ...

25.1 An asymptotic lower bound for fixed action sets 282 for any eventEwe have P(E) +P′(Ec)≥ 1 2 exp (−D(P,P′)) = 1 2 exp ( − 1 ...

25.1 An asymptotic lower bound for fixed action sets 283 Therefore by choosingEto be the event thatTa∗(n)< n/2 and using (25. ...

25.2 Clouds looming for optimism 284 Now sincea∗will be chosen linearly often it is easily shown for suboptimala thatlimn→∞‖a−a∗ ...

25.2 Clouds looming for optimism 285 The constraints mean that 1 α(a 3 )ε^2 γ^2 +α(a 2 )=α(alim 1 )→∞‖a^2 ‖ (^2) H(α)− 1 ≤^1 2 γ ...

25.3 Notes 286 and suffers regret of Ω(log(n)/ε). The key take-away from this is that optimistic algorithms do not choose action ...

25.5 Exercises 287 (a)c(A∪{a},θ)> c(A,θ). (b)c(A∪{a},θ)< c(A,θ). ...

Part VI Adversarial Linear Bandits ...

289 In the next few chapters we will consider adversarial linear bandits, which superficially can be thought of as the adversari ...

This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This p ...

26.1 Convex sets and functions 291 (a) (b) (c) (d) (e) (f) Figure 26.1(a) is a convex set. (b) is a nonconvex set. (c) is the co ...

«
10
11
12
13
14
15
16
17
18
19
»

Free download pdf

Get our desktop app

Company

Features

Documentation

Resources