Bandit Algorithms
19.3 Regret analysis 232 The proof of Theorem 19.2 depends on the following lemma. Lemma19.4.LetV 0 be positive definite andx 1 ...
19.4 Notes 233 By part (a) we also havert≤2, which combined withβn≥max{ 1 ,βt}yields rt≤ 2 ∧ 2 √ βt‖At‖Vt−− 11 ≤ 2 √ βn ( 1 ∧‖At ...
19.4 Notes 234 made in Theorem 19.2 depends on the dimensiondand becomes vacuous when dis large or infinite. This dependence ari ...
19.5 Bibliographic remarks 235 where g : R → R is called the link function. A common choice is g(p) = log(p/(1−p)), which yields ...
19.6 Exercises 236 19.2(Elliptical potentials) LetV 0 =λIandx 1 ,...,xn∈Rdbe a sequence of vectors with‖xt‖ 2 ≤Lfor allt∈[n]. Th ...
19.6 Exercises 237 This exercise is inspired by the work of Perchet and Rigollet [2013] who focus on improving the regret bound ...
This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This p ...
Confidence Bounds for Least Squares Estimators 239 Comparingθ∗andθˆtin the directionx∈Rdwe have 〈x,θˆt−θ∗〉= 〈 x,Vt−^1 ∑t s=1 AsX ...
20.1 Martingales and the method of mixtures 240 that ‖θˆt−θ∗‖Vt= max x∈Sd−^1 〈 Vt^1 /^2 x,θˆt−θ∗ 〉 = max x∈Sd−^1 min y∈Cε [〈 Vt^ ...
20.1 Martingales and the method of mixtures 241 the rewards are Bernoulli. We have now dropped the assumption that (At)∞t=1 are ...
20.1 Martingales and the method of mixtures 242 Chernoff method leads to P ( 1 2 ‖ˆθt−θ∗‖^2 Vt≥log(1/δ) ) =P ( exp ( max x∈Rd 〈x ...
20.1 Martingales and the method of mixtures 243 − 5 0 5 0 1 s= 1 − 5 0 5 0 1 s= 3 Figure 20.1The plots depict Laplace’s approxim ...
20.1 Martingales and the method of mixtures 244 Lemma20.3.Lethbe a probability measure onRd, thenM ̄t= ∫ RdMt(x)dh(x) is aF-adap ...
20.2 Notes 245 And things have worked out beautifully. Since M ̄t is a nonnegative supermartingale, the maximal inequality (Theo ...
20.4 Exercises 246 open questions even in the most obvious examples. The main reference is by Rogers [1964], which by now is a l ...
20.4 Exercises 247 The definitions can be repeated for pseudo-metric spaces. LetXbe a set and d:X×X→[0,∞) be a function that is ...
20.4 Exercises 248 20.8(Extension of Hoeffding–Azuma) The following simple extension of Hoeffding–Azuma is often useful. Letn∈N+ ...
20.4 Exercises 249 (d) Find anfsuch that ∫∞ 0 f(λ)dλ= 1 andf(λ)≥0 for allλ∈Rand log ( 1 λf(λ) ) = (1 +o(1)) log log ( 1 λ ) asλ→ ...
20.4 Exercises 250 The quantities pθ′(Xs)/pθ(Xs) are calledlikelihood ratios. That the product of likelihood ratios forms a mart ...
This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This p ...
«
8
9
10
11
12
13
14
15
16
17
»
Free download pdf