Bandit Algorithms
38.10 Exercises 512 (i)We redefine the regret as follows:R′n =E [∑τ+n− 1 t=τ rAt(St)−nρ∗(MI) ] . Show that ifMis strongly connec ...
Bibliography Y. Abbasi-Yadkori.Forced-exploration based algorithms for playing in bandits with large action sets. PhD thesis, Un ...
BIBLIOGRAPHY 514 Y. Abbasi-Yadkori, P. Bartlett, V. Gabillon, A. Malek, and M. Valko. Best of both worlds: Stochastic & adve ...
BIBLIOGRAPHY 515 Information Processing Systems 24, NIPS, pages 1035–1043. Curran Associates, Inc., 2011. [338] A. Agarwal, D. P ...
BIBLIOGRAPHY 516 N. Ailon, Z. Karnin, and T. Joachims. Reducing dueling bandits to cardinal bandits. InProceedings of the 31st I ...
BIBLIOGRAPHY 517 Learning Research, pages 1109–1134, Barcelona, Spain, 13–15 Jun 2014. PMLR. [338] J.-V. Audibert and S. Bubeck. ...
BIBLIOGRAPHY 518 continuum-armed bandit problem. In International Conference on Computational Learning Theory, pages 454–468. Sp ...
BIBLIOGRAPHY 519 D. Bernoulli. Exposition of a new theory on the measurement of risk. Econometrica: Journal of the Econometric S ...
BIBLIOGRAPHY 520 R. N. Bradt, S. M. Johnson, and S. Karlin. On sequential designs for maximizing the sum ofnobservations. The An ...
BIBLIOGRAPHY 521 S. Bubeck, O. Dekel, T. Koren, and Y. Peres. Bandit convex optimization:√ T regret in one dimension. In P. Gr ̈ ...
BIBLIOGRAPHY 522 O. Catoni. Challenging the empirical mean and empirical variance: a deviation study.Annales de l’Institut Henri ...
BIBLIOGRAPHY 523 W. Chen, W. Hu, F. Li, J. Li, Y. Liu, and P. Lu. Combinatorial multi-armed bandit with general reward functions ...
BIBLIOGRAPHY 524 stochastic bandits. InAdvances in Neural Information Processing Systems 30 (NIPS), pages 1761–1769, 2017. [237, ...
BIBLIOGRAPHY 525 S. Dong and B. Van Roy. An information-theoretic analysis for Thompson sampling with many actions.arXiv preprin ...
BIBLIOGRAPHY 526 R. Fruit, M. Pirotta, and A. Lazaric. Near optimal exploration-exploitation in non-communicating markov decisio ...
BIBLIOGRAPHY 527 S. Gerchinovitz. Sparsity regret bounds for individual sequences in online linear regression. Journal of Machin ...
BIBLIOGRAPHY 528 A. Gy ̈orgy and Cs. Szepesv ́ari. Shifting regret, mirror descent, and matrices. In M. F. Balcan and K. Q. Wein ...
BIBLIOGRAPHY 529 J. Honda and A. Takemura. Non-asymptotic analysis of a new bandit algorithm for semi-bounded rewards.Journal of ...
BIBLIOGRAPHY 530 Machine Learning Research, pages 1453–1461, Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR. [339] K. Jun, A. Bharg ...
BIBLIOGRAPHY 531 E. Kaufmann. On Bayesian index policies for sequential resource allocation.The Annals of Statistics, 46(2):842– ...
«
19
20
21
22
23
24
25
26
27
28
»
Free download pdf