Bandit Algorithms

BIBLIOGRAPHY 535

L. A. Levin. On the notion of a random sequence. InSoviet. Math. Dokl.,
volume 14, pages 1413–1416, 1973. [142]
L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar. Hyperband:
A novel bandit-based approach to hyperparameter optimization.Journal of
Machine Learning Research, 18(185):1–52, 2018a. [390]
S. Li, B. Wang, S. Zhang, and W. Chen. Contextual combinatorial cascading
bandits. InProceedings of the 33rd International Conference on Machine
Learning, pages 1245–1253, 2016. [374]
S. Li, T. Lattimore, and Cs. Szepesv ́ari. Online learning to rank with features.
2018b. [372]
T. Liang, H. Narayanan, and A. Rakhlin. On zeroth-order stochastic convex
optimization via random walks.arXiv preprint: arXiv 1402.2667, 2014. [389]
T. Lin, B. Abrahao, R. Kleinberg, J. Lui, and W. Chen. Combinatorial partial
monitoring game with linear feedback and its applications. InInternational
Conference on Machine Learning, pages 901–909, 2014. [473]
T. Lin, J. Li, and W. Chen. Stochastic online greedy learning with semi-bandit
feedbacks. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and
R. Garnett, editors,Advances in Neural Information Processing Systems 28,
pages 352–360. Curran Associates, Inc., 2015. [350]
N. Littlestone and M. K. Warmuth. The weighted majority algorithm.Information
and computation, 108(2):212–261, 1994. [142, 154]
L. Lov ́asz and S. Vempala. The geometry of logconcave functions and sampling
algorithms.Random Structures & Algorithms, 30(3):307–358, 2007. [307]
H. Luo, C-Y. Wei, A. Agarwal, and J. Langford. Efficient contextual bandits in
non-stationary worlds. In S ́ebastien Bubeck, Vianney Perchet, and Philippe
Rigollet, editors,Proceedings of the 31st Conference On Learning Theory,
volume 75 ofProceedings of Machine Learning Research, pages 1739–1776.
PMLR, 06–09 Jul 2018. [361]
D. MacKay.Information theory, inference and learning algorithms. Cambridge
university press, 2003. [185]
S. Magureanu, R. Combes, and A. Prouti`ere. Lipschitz bandits: Regret lower
bound and optimal algorithms. InThe 27th Conference on Learning Theory
(COLT), pages 975–999, 2014. [237, 337]
O. Maillard. Robust risk-averse stochastic multi-armed bandits. InALT, pages
218–233. Springer, Berlin, Heidelberg, 2013. [66]
O. Maillard, R. Munos, and G. Stoltz. Finite-time analysis of multi-armed bandits
problems with Kullback-Leibler divergences. InProceedings of Conference On
Learning Theory (COLT), 2011. [135]
S. Mannor and O. Shamir. From bandits to experts: On the value of side-
observations. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, and
K. Q. Weinberger, editors,Advances in Neural Information Processing Systems
24 , pages 684–692. Curran Associates, Inc., 2011. [339, 473]
S. Mannor and N. Shimkin. On-line learning with imperfect monitoring. In
Learning Theory and Kernel Machines, pages 552–566. Springer, 2003. [473]

Bandit Algorithms

BIBLIOGRAPHY 535

Get our desktop app

Company

Features

Documentation

Resources