Bandit Algorithms

BIBLIOGRAPHY 534

T. Lattimore and R. Munos. Bounded regret for finite-armed structured
bandits. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and
K. Q. Weinberger, editors,Advances in Neural Information Processing Systems
27 , NIPS, pages 550–558. Curran Associates, Inc., 2014. [235]
T. Lattimore and Cs. Szepesv ́ari. The end of optimism? an asymptotic analysis
of finite-armed linear bandits. In A. Singh and J. Zhu, editors,Proceedings
of the 20th International Conference on Artificial Intelligence and Statistics,
volume 54 ofProceedings of Machine Learning Research, pages 728–737, Fort
Lauderdale, FL, USA, 20–22 Apr 2017. PMLR. [286]
T. Lattimore and Cs. Szepesv ́ari. Cleaning up the neighbourhood: A full
classification for adversarial partial monitoring. InInternational Conference
on Algorithmic Learning Theory, 2019. [471, 473, 475]
T. Lattimore, K. Crammer, and Cs. Szepesv ́ari. Linear multi-resource allocation
with semi-bandit feedback. In C. Cortes, N. D. Lawrence, D. D. Lee,
M. Sugiyama, and R. Garnett, editors,Advances in Neural Information
Processing Systems 28, NIPS, pages 964–972. Curran Associates, Inc., 2015.
[269]
T. Lattimore, B. Kveton, S. Li, and Cs. Szepesv ́ari. Toprank: A practical
algorithm for online stochastic ranking. In S. Bengio, H. Wallach, H. Larochelle,
K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors,Advances in Neural
Information Processing Systems 31, pages 3949–3958. Curran Associates, Inc.,

[249, 351, 374]
B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by
model selection.Annals of Statistics, pages 1302–1338, 2000. [78]
A. Lazaric and R. Munos. Hybrid stochastic-adversarial on-line learning. In
COLT, 2009. [223]
T. Le, Cs. Szepesv ́ari, and R. Zheng. Sequential learning for multi-channel
wireless network monitoring with channel switching costs.IEEE Transactions
on Signal Processing, 62(22):5919–5929, 2014. [15]
L Le Cam. Convergence of estimates under dimensionality restrictions. The
Annals of Statistics, 1(1):38–53, 1973. [193]
E. L. Lehmann and G. Casella.Theory of point estimation. Springer Science &
Business Media, 2006. [407]
H. Lei, A. Tewari, and S. A. Murphy. An actor-critic contextual bandit algorithm
for personalized mobile health interventions. 2017. [15]
J. Leike, T. Lattimore, L. Orseau, and M. Hutter. Thompson sampling is
asymptotically optimal in general environments. InProceedings of the 32nd
Conference on Uncertainty in Artificial Intel ligence, UAI, pages 417–426. AUAI
Press, 2016. [444]
H. R. Lerche.Boundary crossing of Brownian motion: Its relation to the law of
the iterated logarithm and to sequential analysis. Springer, 1986. [126]
D. A. Levin and Y. Peres.Markov chains and mixing times, volume 107. American
Mathematical Soc., 2017. [52]

Bandit Algorithms

BIBLIOGRAPHY 534

Get our desktop app

Company

Features

Documentation

Resources