Bandit Algorithms
This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This p ...
8.1 Asymptotically optimal UCB 113 Furthermore, lim sup n→∞ Rn log(n) ≤ ∑ i:∆i> 0 2 ∆i . (8.2) Choosingε= ∆i/2 inside the sum ...
8.1 Asymptotically optimal UCB 114 as required. Proof of Theorem 8.1 As usual, the starting point is the fundamental regret deco ...
8.2 Notes 115 we use Lemma 8.2 to get E [n ∑ t=1 I { μˆi(t−1) + √ 2 logf(t) Ti(t−1) ≥μ 1 −εandAt=i }] ≤E [n ∑ t=1 I { ˆμi(t−1) + ...
8.4 Exercises 116 regret on all problems (see Part IV). The policy proposed by Lai and Robbins [1985] was based on upper confide ...
8.4 Exercises 117 a policy by At= 1 if ˆμ 1 (t−1) + √ 2 logf(t) T 1 (t−1)≥^0 2 otherwise. (8.5) Suppose thatν= (P 1 ,P 2 ) ...
This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This p ...
9.1 The MOSS algorithm 119 Before the proof we state and prove a strengthened version of Corollary 5.5. Theorem9.2. LetX 1 ,X 2 ...
9.1 The MOSS algorithm 120 Proof LetSt=tμˆt. Then P ( existss≥1 : ˆμs+ √ 4 s log+ ( 1 sδ ) + ∆≤ 0 ) =P ( existss≥1 :Ss+ √ 4 slog ...
9.1 The MOSS algorithm 121 but ∆ is sufficiently small in expectation that this price is small. Using the basic regret decomposi ...
9.2 Two problems 122 follows by naively bounding 1/8 + 4 log 8 + 2 √ πlog 8 + 1≤15. Then E ∑ i:∆i>max { 2∆, 8 √ k/n } ∆i ...
9.3 Notes 123 whereδis a parameter of the policy that determines the likelihood that the optimal arm is misidentified. The choic ...
9.4 Bibliographic remarks 124 the amount of optimism to match the instance. One of the authors has proposed two ways to do this ...
9.5 Exercises 125 (a)Show that for all 1-subgaussian bandits this new policy suffers regret at most Rn≤C ∑ i:∆i> 0 ∆i+ 1 ...
9.5 Exercises 126 byλ(t) =f(t)−tf′(t). Letτ=min{t:Bt=f(t)}andv(t) be the density ofτ. Show that v(t)≤ λ(t) √ 2 πt^3 exp ( − f(t) ...
This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This p ...
10.1 Concentration for sums of Bernoulli random variables 128 More generally, therelative entropyorKullback-Leibler divergence i ...
10.1 Concentration for sums of Bernoulli random variables 129 Proof We will again use the Cramer-Chernoff method. Letλ >0 be ...
10.1 Concentration for sums of Bernoulli random variables 130 First notice thatU ≥ μˆand d(μ,ˆ·) is strictly increasing on [μ,ˆ1 ...
10.2 The KL-UCB algorithm 131 10.2 The KL-UCB algorithm The difference between KL-UCB and UCB is that Chernoff’s bound is used t ...
«
2
3
4
5
6
7
8
9
10
11
»
Free download pdf