Bandit Algorithms

5.2 The inequalities of Markov and Chebyshev 72 Figure 5.1The figure shows a probability density, with the tails shaded indicati ...

5.2 The inequalities of Markov and Chebyshev 73 is well behaved the inequality is rather loose. By assuming that higher moments ...

5.3 The Cramer-Chernoff method and subgaussian random variables 74 5.3 The Cramer-Chernoff method and subgaussian random variabl ...

5.3 The Cramer-Chernoff method and subgaussian random variables 75 A similar inequality holds for the left tail. By using the un ...

5.4 Notes 76 (a) IfXis Gaussian with mean zero and varianceσ^2 , thenXisσ-subgaussian. (b)IfX has mean zero and|X| ≤Balmost sure ...

5.4 Notes 77 √ 5-subgaussian: E[exp(λX)] =E [∞ ∑ i=0 λiXi i! ] ≤1 + ∑∞ i=2 E [ λi|X|i i! ] ≤1 + ∑∞ i=2 ∫∞ 0 P ( |X|≥ i!^1 /i λ x ...

5.5 Bibliographical remarks 78 An easy calculation shows thatMY(λ) = (1− 2 λ)−n/^2 forλ∈[0, 1 /2) and MY(λ) is undefined forλ≥ 1 ...

5.6 Exercises 79 5.5 (Berry–Esseen inequality) Let X 1 ,X 2 ,...,Xn be a sequence of independent and identically distributed ran ...

5.6 Exercises 80 (a)MXis convex and in particular dom(MX) is an interval containing zero. (b)MX(λ)≥eλE[X]for allλ∈dom(MX). (c)Fo ...

5.6 Exercises 81 (b) Prove Hoeffding’s inequality (5.8). Hint For part (a) it suffices to prove thatψX(λ)≤λ^2 (b−a)^2 /4. By Tay ...

5.6 Exercises 82 5.14(Bernstein’s inequality) LetX 1 ,...,Xnbe a sequence of independent random variables withXt−E[Xt]≤balmost s ...

5.6 Exercises 83 eachx∈[0,1] andt∈[n]. Prove for anyε >0 that P (n ∑ t=1 log(1/Xt)≥ε ) ≤ (ε n )n exp(n−ε). 5.17(Concentration ...

5.6 Exercises 84 Hint Use Jensen’s inequality to show thatexp(λE[Z])≤E[exp(λZ)] and then provide a naive bound on the moment gen ...

Part II Stochastic Bandits with Finitely Many Arms ...

86 Over the next few chapters we introduce the fundamental algorithms and tools of analysis for unstructured stochastic bandits ...

This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This p ...

6.1 Algorithm and regret analysis 88 is written formally as ˆμi(t) = 1 Ti(t) ∑t s=1 I{As=i}Xs, whereTi(t) = ∑t s=1I{As=i}is the ...

6.1 Algorithm and regret analysis 89 definitions of (ˆμj)jand the algorithm. Hence by Corollary 5.5, P(ˆμi(mk)−μi−μˆ 1 (mk) +μ 1 ...

6.2 Notes 90 plays arms alternately until it decides based on its observations to commit to a single arm for the remainder (Exer ...

6.3 Bibliographical remarks 91 plays the empirically best arm with probability 1−εtand otherwise explores uniformly at random. Y ...

«
1
2
3
4
5
6
7
8
9
10
»

Free download pdf

Get our desktop app

Company

Features

Documentation

Resources