Bandit Algorithms
26.2 Jensen’s inequality 292 is obsessed with convex functions because all local minimums are global (see Fig. 26.1). This means ...
26.3 Bregman divergence 293 y x Df(x, y) f(x) f(y)+〈∇f(y), x−y〉 Figure 26.2Bregman divergence where∇f(y)∈Rdis the gradient offat ...
26.4 Legendre functions 294 26.4 Legendre functions 0 1 2 3 − 1 − 0. 5 0 xlog(x)−x Figure 26.3f(x) =xlog(x)−x: the archetypical ...
26.4 Legendre functions 295 Example26.10.Letf(x) = ∑ ixilog(xi)−xibe the unnormalized negentropy, which we met in Example 26.5. ...
26.5 Optimization 296 better point −∇f(x) −∇f(x∗) Figure 26.4Illustration of first-order optimality conditions. The point at the ...
26.6 Projections 297 Proposition26.13. Letf :Rd →R ̄ be a convex function andA ⊂Rda nonempty convex set. Then for anyx∗∈dom(∇f)i ...
26.8 Bibliographic remarks 298 andclosedif its complementAc=Rd\Ais open. TheboundaryofAis denoted by∂Aand is the set of points i ...
26.9 Exercises 299 (e)f(x) =|x|onR. (f)f(x) = max{|x|,x^2 }onR. 26.4Prove Theorem 26.3. 26.5Prove Corollary 26.7. 26.6Prove Prop ...
26.9 Exercises 300 Show thatxi= { α ifi < m (1−(m−1)α)yi/ ∑d j=myj otherwise. ...
This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This p ...
27.2 Regret analysis 302 functionP ̃t:A→[0,1] given by P ̃t(a)∝exp ( −η ∑t−^1 s=1 Yˆs(a) ) , whereη >0 is the learning rate. ...
27.2 Regret analysis 303 1:Input Action setA ⊂ Rd, learning rateη, exploration distributionπ, exploration parameterγ 2:fort= 1, ...
27.3 Continuous exponential weights 304 which implies that |ηYˆt(a)|≤ η γmaxν∈Aν Q− (^1) (π)ν=η γmaxν∈A‖ν‖ (^2) Q− (^1) (π). (2 ...
27.3 Continuous exponential weights 305 play the biggest role in the feasibility of sampling from a measure are(a)what is the fo ...
27.4 Notes 306 Proof of Theorem 27.3 Choosingγ=dηensures that|η〈a,Yˆt〉|≤1 for alla∈A. The standard argument (Exercise 27.9) show ...
27.5 Bibliographic remarks 307 27.5 Bibliographic remarks The results in Sections 27.1 and 27.2 follow the article by Bubeck et ...
27.6 Exercises 308 27.5 (Changing action sets) Provide the necessary corrections to Algorithm 15 and its analysis to prove the r ...
This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This p ...
28.1 Online linear optimization 310 〈at,yt〉. Actions are not capitalized in this section because the algorithms presented here d ...
28.1 Online linear optimization 311 occurs in the interior ofD ⊆Aand that∇Φt(at+1) = 0 (see Exercise 28.1). This means thatηyt=∇ ...
«
11
12
13
14
15
16
17
18
19
20
»
Free download pdf