Bandit Algorithms
10.2 The KL-UCB algorithm 132 second shows that the index of any other arm is not often much larger than the same value. These r ...
10.2 The KL-UCB algorithm 133 Proof Letε∈(0,∆) andu=a/d(μ+ε,μ+ ∆). Then E[κ] = ∑n s=1 P ( d(ˆμs,μ+ ∆)≤a s ) ≤ ∑n s=1 P ( μˆs≥μ+ε ...
10.3 Notes 134 follows from Lemmas 10.7 and 10.8. The first claim of the theorem is completed by substituting the above into the ...
10.4 Bibliographic remarks 135 appealing to the central limit theorem. The answer is no. First, the quality of the approximation ...
10.5 Exercises 136 Hint Chooseε 1 ,ε 2 to decrease slowly withnand use the first part of the theorem. 10.3(Concentration for bou ...
10.5 Exercises 137 (f)Find a choice ofhandTsuch that{Pθ:θ∈Θ}is the family of Bernoulli distributions. (g)Find a choice ofhandTsu ...
10.5 Exercises 138 (a)Implement Algorithm 8 and Algorithm 6 where the latter algorithm should be tuned for 1/2-subgaussian bandi ...
Part III Adversarial Bandits with Finitely Many Arms ...
140 Statistician George E. P. Box is famous for writing “All models are wrong, but some are useful”. In the stochastic bandit mo ...
141 The usefulness of the stochastic model depends on the setting. In particular, the designer of the bandit algorithm must care ...
142 bounded byC √ nklog(n)? The lazy way is to push part of the proof into the assumptions. For UCB this might mean replacing a ...
This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This p ...
11.1 Adversarial bandit environments 144 The only source of randomness in the regret comes from the randomness in the actions of ...
11.2 Importance-weighted estimators 145 of the maximum function we have Rn(π,ν) = max i∈[k] E [n ∑ t=1 (Xti−XtAt) ] ≤E [ max i∈[ ...
11.2 Importance-weighted estimators 146 Being unbiased is a good start, but the variance of an estimator is also important. For ...
11.3 The Exp3 algorithm 147 In Exercise 11.4 we ask you to show that all unbiased estimators in this setting are importance-weig ...
11.4 Regret analysis 148 1:Input: n,k,η 2:SetSˆ 0 i= 0 for alli 3:fort= 1,...,ndo 4: Calculate the sampling distributionPt: Pti= ...
11.4 Regret analysis 149 The ratio in the product can be rewritten in terms ofPtby Wt Wt− 1 = ∑k j=1 exp(ηSˆt− 1 ,j) Wt− 1 exp(η ...
11.4 Regret analysis 150 ofXˆtj^2 leads to E ∑n t=1 ∑k j=1 Xˆtj^2 =E ∑n t=1 ∑k j=1 Ptj ( 1 −I{At=j}ytj Ptj ) 2 = ...
11.4 Regret analysis 151 − 0. 5 0 0. 5 − 0. 1 0 0. 1 x exp(x)−(1 +x) exp(x)−(1 +x+x^2 ) exp(x)−(1 +x+x^2 /2) Figure 11.2Approxim ...
«
3
4
5
6
7
8
9
10
11
12
»
Free download pdf