Bandit Algorithms

36.2 Frequentist analysis 432 1:Input Cumulative distribution functionsF 1 (1),...,Fk(1) 2:fort= 1,...,ndo 3: Sampleθi(t)∼Fi(t) ...

36.2 Frequentist analysis 433 We start with a straightforward decomposition: E[Ti(n)] =E [n ∑ t=1 I{At=i} ] =E [n ∑ t=1 I{At=i,E ...

36.2 Frequentist analysis 434 calculation (Exercise 36.5) we get E [n ∑ t=1 I{At=i,Eic(t) occurs} ] ≤E [ ∑ t∈T I{At=i} ] +E [ ∑ ...

36.3 Linear bandits 435 0 100 200 300 400 0 100 200 300 Thompson sampling Regret Proportion × Regret 2 0 100 200 300 400 0 100 2 ...

36.3 Linear bandits 436 O(d √ nlog(n) log(n/d)), which matches the upper bound obtained for Lin-UCB in Corollary 19.3. Proof We ...

36.4 Information theoretic analysis 437 Putting together the pieces shows that BRn≤2 + 2 √ 2 dnβ^2 log ( 1 + nS^2 L^2 d ) 36.3.1 ...

36.4 Information theoretic analysis 438 from context we writeHP. LetG ⊆Fbe a sub-σ-algebra. Then theconditional entropyofXgivenG ...

36.4 Information theoretic analysis 439 π= (πt)nt=1be a policy andX∈[0,1]n×kand (At)nt=1be random elements on some probability s ...

36.4 Information theoretic analysis 440 we have BRn=E [n ∑ t=1 (XtA∗−XtAt) ] =E [n ∑ t=1 Et− 1 [XtA∗−XtAt] ] =E [n ∑ t=1 √ It− 1 ...

36.5 Notes 441 andAtare conditionally independent givenA∗. Hence, It− 1 (A∗;Ft) =Et− 1 [ E ̃t− 1 [ log P ̃t− 1 (A∗=a|XtAt) ̃Pt− ...

36.5 Notes 442 ‘approximate’ Thompson sampling and approximating a sample from each of the above three choices may lead to diffe ...

36.5 Notes 443 (pushforward ofPunder (X,Y)). From this form, we immediately see that mutual information is symmetric (in line wi ...

36.6 Bibliographic remarks 444 10 The information-theoretic ideas in Section 36.4 suggest that rather than samplingAtfrom the po ...

36.7 Exercises 445 by Abeille and Lazaric [2017a] and a spectral version by Koc ́ak et al. [2014]. A recent paper analyzes the c ...

36.7 Exercises 446 Hint For (a) you may find it useful to know that fory≥0, 1 −Φ(y)≥ exp(−y^2 /2) y+ √ y^2 + 4 , where Φ(y) =√^1 ...

36.7 Exercises 447 The result continues to hold ifQis the space of all probability measures on (E,B(E)). This formulation avoids ...

Part VIII Beyond Bandits ...

This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This p ...

37.1 Finite adversarial partial monitoring problems 450 37.1 Finite adversarial partial monitoring problems To reduce clutter we ...

37.1 Finite adversarial partial monitoring problems 451 the following problem: L= ( 0 0 1 1 ) , Φ = ( 1 1 1 1 ) . Clearly, in th ...

«
18
19
20
21
22
23
24
25
26
27
»

Free download pdf

Get our desktop app

Company

Features

Documentation

Resources