Bandit Algorithms

(Jeff_L) #1

BIBLIOGRAPHY 534


T. Lattimore and R. Munos. Bounded regret for finite-armed structured
bandits. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and
K. Q. Weinberger, editors,Advances in Neural Information Processing Systems
27 , NIPS, pages 550–558. Curran Associates, Inc., 2014. [235]
T. Lattimore and Cs. Szepesv ́ari. The end of optimism? an asymptotic analysis
of finite-armed linear bandits. In A. Singh and J. Zhu, editors,Proceedings
of the 20th International Conference on Artificial Intelligence and Statistics,
volume 54 ofProceedings of Machine Learning Research, pages 728–737, Fort
Lauderdale, FL, USA, 20–22 Apr 2017. PMLR. [286]
T. Lattimore and Cs. Szepesv ́ari. Cleaning up the neighbourhood: A full
classification for adversarial partial monitoring. InInternational Conference
on Algorithmic Learning Theory, 2019. [471, 473, 475]
T. Lattimore, K. Crammer, and Cs. Szepesv ́ari. Linear multi-resource allocation
with semi-bandit feedback. In C. Cortes, N. D. Lawrence, D. D. Lee,
M. Sugiyama, and R. Garnett, editors,Advances in Neural Information
Processing Systems 28, NIPS, pages 964–972. Curran Associates, Inc., 2015.
[269]
T. Lattimore, B. Kveton, S. Li, and Cs. Szepesv ́ari. Toprank: A practical
algorithm for online stochastic ranking. In S. Bengio, H. Wallach, H. Larochelle,
K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors,Advances in Neural
Information Processing Systems 31, pages 3949–3958. Curran Associates, Inc.,



  1. [249, 351, 374]
    B. Laurent and P. Massart. Adaptive estimation of a quadratic functional by
    model selection.Annals of Statistics, pages 1302–1338, 2000. [78]
    A. Lazaric and R. Munos. Hybrid stochastic-adversarial on-line learning. In
    COLT, 2009. [223]
    T. Le, Cs. Szepesv ́ari, and R. Zheng. Sequential learning for multi-channel
    wireless network monitoring with channel switching costs.IEEE Transactions
    on Signal Processing, 62(22):5919–5929, 2014. [15]
    L Le Cam. Convergence of estimates under dimensionality restrictions. The
    Annals of Statistics, 1(1):38–53, 1973. [193]
    E. L. Lehmann and G. Casella.Theory of point estimation. Springer Science &
    Business Media, 2006. [407]
    H. Lei, A. Tewari, and S. A. Murphy. An actor-critic contextual bandit algorithm
    for personalized mobile health interventions. 2017. [15]
    J. Leike, T. Lattimore, L. Orseau, and M. Hutter. Thompson sampling is
    asymptotically optimal in general environments. InProceedings of the 32nd
    Conference on Uncertainty in Artificial Intel ligence, UAI, pages 417–426. AUAI
    Press, 2016. [444]
    H. R. Lerche.Boundary crossing of Brownian motion: Its relation to the law of
    the iterated logarithm and to sequential analysis. Springer, 1986. [126]
    D. A. Levin and Y. Peres.Markov chains and mixing times, volume 107. American
    Mathematical Soc., 2017. [52]

Free download pdf