Bandit Algorithms

(Jeff_L) #1

BIBLIOGRAPHY 536


S. Mannor and J. N. Tsitsiklis. The sample complexity of exploration in the
multi-armed bandit problem.Journal of Machine Learning Research, 5:623–648,
December 2004. [387]
S. Mannor, V. Perchet, and G. Stoltz. Set-valued approachability and online
learning with partial monitoring.The Journal of Machine Learning Research,
15(1):3247–3295, 2014. [473]
H. Markowitz. Portfolio selection.The journal of finance, 7(1):77–91, 1952. [65]
M. E. Maron and J. L. Kuhns. On relevance, probabilistic indexing and
information retrieval.Journal of the ACM, 7(3):216–244, 1960. [375]
P. Martin-L ̈of. The definition of random sequences.Information and control, 9
(6):602–619, 1966. [142]
A. Maurer and M. Pontil. Empirical bernstein bounds and sample variance
penalization.arXiv preprint arXiv:0907.3740, 2009. [82, 110]
B. C. May, N. Korda, A. Lee, and D. S. Leslie. Optimistic Bayesian sampling in
contextual-bandit problems.The Journal of Machine Learning Research, 13
(1):2069–2106, 2012. [444]
C. McDiarmid. Concentration. InProbabilistic methods for algorithmic discrete
mathematics, pages 195–248. Springer, 1998. [78, 248]
H. B. McMahan and A. Blum. Online geometric optimization in the bandit
setting against an adaptive adversary. InCOLT, volume 3120, pages 109–123.
Springer, 2004. [350]
H. B. McMahan and M. J. Streeter. Tighter bounds for multi-armed bandits
with expert advice. InCOLT, 2009. [222]
P. M ́enard and A. Garivier. A minimax and asymptotically optimal algorithm for
stochastic bandits. In S. Hanneke and L. Reyzin, editors,Proceedings of the
28th International Conference on Algorithmic Learning Theory, volume 76 of
Proceedings of Machine Learning Research, pages 223–237, Kyoto University,
Kyoto, Japan, 15–17 Oct 2017. PMLR. [116, 124, 135]
S. P. Meyn and R. L. Tweedie.Markov chains and stochastic stability. Springer
Science & Business Media, 2012. [51, 52]
V. Mnih, Cs. Szepesv ́ari, and J.-Y. Audibert. Empirical bernstein stopping. In
Proceedings of the 25th International Conference on Machine Learning, ICML,
pages 672–679, New York, NY, USA, 2008. ACM. [82, 110]
J. A. Nelder and R. W. M. Wedderburn. Generalized linear models.Journal of
the Royal Statistical Society. Series A (General), 135(3):370–384, 1972. [235]
A. S. Nemirovski. Efficient methods for large-scale convex optimization problems.
Ekonomika i Matematicheskie Metody, 15, 1979. [322]
A. S. Nemirovsky and D. B. Yudin.Problem Complexity and Method Efficiency
in Optimization. Wiley, 1983. [322, 388, 389]
G. Neu. Explore no more: Improved high-probability regret bounds for non-
stochastic bandits. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama,
and R. Garnett, editors,Advances in Neural Information Processing Systems
28 , NIPS, pages 3168–3176. Curran Associates, Inc., 2015a. [163, 164, 223, 349,
350]

Free download pdf