Bandit Algorithms
32.4 Notes 372 nm. The proof of the first part is completed by using Lemma 32.4 to bound the first term and Lemma 32.7 to bound ...
32.4 Notes 373 modelan adversary secretly chooses a sequence of setsS 1 ,...,SnwithSt⊆[`]. In each roundtthe learner choosesAt∈A ...
32.5 Bibliographic remarks 374 32.5 Bibliographic remarks The policy and analysis presented in this chapter is by the authors an ...
32.6 Exercises 375 “The theoretical model that justifies ranking documents in this way is the probabilistic ranking principle [R ...
This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This p ...
33.1 Simple regret 377 subgaussian bandit. Then for alln≥k, Rsimplen (π,ν)≤min ∆≥ 0 ∆ + ∑ i:∆i(ν)>∆ ∆i(ν) exp ( −bn/kc∆i(ν ...
33.2 Best-arm identification with a fixed confidence 378 Proof By the regret decomposition identity (4.5), Rn(π,ν) =nE [k ∑ i=1 ...
33.2 Best-arm identification with a fixed confidence 379 The objective in fixed confidence best-arm identification is to find a ...
33.2 Best-arm identification with a fixed confidence 380 from the high probability Pinsker’s inequality (Theorem 14.2) and the s ...
33.2 Best-arm identification with a fixed confidence 381 (c) WhenE=E^2 N(1) andν∈Ehas a unique optimal arm, then c∗(ν)−^1 = 1 2 ...
33.2 Best-arm identification with a fixed confidence 382 If the inequality is tight, then we might guess that a reasonable stopp ...
33.2 Best-arm identification with a fixed confidence 383 |i∗(ν)|= 1it holds that lim δ→ 0 Eνπ[τ] log(1/δ) =c∗(ν). Note thatπandτ ...
33.2 Best-arm identification with a fixed confidence 384 and ˆμ≈μand Zt= inf ̃ν∈Ealt(ˆν(t)) ∑k i=1 Ti(t)(ˆμi(t)−μi( ̃ν))^2 2 ≈t ...
33.3 Best-arm identification with a budget 385 33.3 Best-arm identification with a budget In the fixed-budget variant of best-ar ...
33.4 Notes 386 ∑k i=1min{^1 /∆ (^2) i, 1 /∆ (^2) min}. Then H 2 (μ)≤H 1 (μ)≤(1 + log(k))H 2 (μ). (33.8) Furthermore, both inequa ...
33.5 Bibliographical remarks 387 of selecting a suboptimal arm decays only polynomially withn. This result holds no matter howAn ...
33.5 Bibliographical remarks 388 is to find anε-optimal arm with high probability with as few samples as possible. After a dry s ...
33.5 Bibliographical remarks 389 the simple-regret for algorithms that construct gradient estimates by injecting random noise (a ...
33.6 Exercises 390 results in Section 33.2. Glynn and Juneja [2015] gives further pointers to this literature, while connecting ...
33.6 Exercises 391 (a)For eachε >0 andδ∈(0,1) and number of armsk >1 design a policyπ and stopping timeτsuch that for allν ...
«
15
16
17
18
19
20
21
22
23
24
»
Free download pdf