Bandit Algorithms
BIBLIOGRAPHY 532 T. Koc ́ak, M. Valko, R. Munos, and S. Agrawal. Spectral Thompson sampling. InAAAI, pages 1911–1917, 2014. [445 ...
BIBLIOGRAPHY 533 T. L. Lai. Adaptive treatment allocation and the multi-armed bandit problem. The Annals of Statistics, pages 10 ...
BIBLIOGRAPHY 534 T. Lattimore and R. Munos. Bounded regret for finite-armed structured bandits. In Z. Ghahramani, M. Welling, C. ...
BIBLIOGRAPHY 535 L. A. Levin. On the notion of a random sequence. InSoviet. Math. Dokl., volume 14, pages 1413–1416, 1973. [142] ...
BIBLIOGRAPHY 536 S. Mannor and J. N. Tsitsiklis. The sample complexity of exploration in the multi-armed bandit problem.Journal ...
BIBLIOGRAPHY 537 G. Neu. First-order regret bounds for combinatorial semi-bandits. In P. Gr ̈unwald, E. Hazan, and S. Kale, edit ...
BIBLIOGRAPHY 538 editors,Proceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machin ...
BIBLIOGRAPHY 539 H. Robbins and D. Siegmund. A class of stopping rules for testing parametric hypotheses. InProceedings of the S ...
BIBLIOGRAPHY 540 A. Sani, A. Lazaric, and R. Munos. Risk-aversion in multi-armed bandits. In F. Pereira, C. J. C. Burges, L. Bot ...
BIBLIOGRAPHY 541 A. Slivkins and E. Upfal. Adapting to a changing environment: the brownian restless bandits. InCOLT, pages 343– ...
BIBLIOGRAPHY 542 V. Syrgkanis, A. Krishnamurthy, and R. Schapire. Efficient algorithms for adversarial contextual learning. InPr ...
BIBLIOGRAPHY 543 L. Tran-Thanh, A. Chapman, A. Rogers, and N. R. Jennings. Knapsack based optimal policies for budget-limited mu ...
BIBLIOGRAPHY 544 P. Varaiya, J. Walrand, and C. Buyukkoc. Extensions of the multiarmed bandit problem: The discounted case.IEEE ...
BIBLIOGRAPHY 545 of Machine Learning Research, pages 1113–1122, Lille, France, 07–09 Jul 2015. PMLR. [350] P. Whittle. Multi-arm ...
BIBLIOGRAPHY 546 M. Zoghi, S. Whiteson, R. Munos, and M. Rijke. Relative upper confidence bound for the k-armed dueling bandit p ...
Index 1-armed bandit, 9, 68, 116 Bayesian, 413–417 χ-squared distance, 184 , 194 σ-algebra, 20 restriction of, 41 a.s., 34 Abel ...
INDEX 548 concave, 292 conditional entropy, 438 conditional expectation, 32 conditional independence, 35 conditional probability ...
INDEX 549 generalized linear model, 234 Gittins index, 360, 417 globally observable, 455 gradient descent, 311 Hahn decompositio ...
INDEX 550 minimax optimal, 172 mirror descent, 152, 310 , 345, 355 model, 397 MOSS, 116, 118 multiclass classification with band ...
INDEX 551 reinforcement learning, 12, 91, 425, 487 relative entropy, 137,179–182, 293 restless bandit, 360, 427 reward stack, 63 ...
«
19
20
21
22
23
24
25
26
27
28
»
Free download pdf