Bandit Algorithms

(Jeff_L) #1

BIBLIOGRAPHY 538


editors,Proceedings of the 35th International Conference on Machine Learning,
volume 80 ofProceedings of Machine Learning Research, pages 4102–4110.
PMLR, 10–15 Jul 2018. [339]
J. Poland. FPL analysis for adaptive bandits. In O. B. Lupanov, O. M.
Kasim-Zade, A. V. Chaskin, and K. Steinh ̈ofel, editors,Stochastic Algorithms:
Foundations and Applications, pages 58–69, Berlin, Heidelberg, 2005. Springer
Berlin Heidelberg. [350]
D. Pollard.A user’s guide to measure theoretic probability, volume 8. Cambridge
University Press, 2002. [40]
E. L. Presman and I. N. Sonin.Sequential control with incomplete information.
The Bayesian approach to multi-armed bandit problems. Academic Press, 1990.
[15, 427]
M. Puterman. Markov decision processes: discrete stochastic dynamic
programming, volume 414. Wiley, 2009. [500, 503]
C. Qin, D. Klabjan, and D. Russo. Improving the expected improvement algorithm.
In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan,
and R. Garnett, editors,Advances in Neural Information Processing Systems
30 , pages 5381–5391. Curran Associates, Inc., 2017. [388]
F. Radlinski, R. Kleinberg, and T. Joachims. Learning diverse rankings with
multi-armed bandits. InProceedings of the 25th International Conference on
Machine Learning, pages 784–791. ACM, 2008. [373, 374, 375]
A. N. Rafferty, H. Ying, and J. J. Williams. Bandit assignment for educational
experiments: Benefits to students versus statistical power. In Artificial
Intel ligence in Education, pages 286–290. Springer, 2018. [16]
A. Rakhlin and K. Sridharan. BISTRO: An efficient relaxation-based method
for contextual bandits. InProceedings of the 33rd International Conference on
Machine Learning, pages 1977–1985, 2016. [223]
A. Rakhlin and K. Sridharan. On equivalence of martingale tail bounds and
deterministic regret inequalities. In S. Kale and O. Shamir, editors,Proceedings
of the 2017 Conference on Learning Theory, volume 65 ofProceedings of
Machine Learning Research, pages 1704–1722, Amsterdam, Netherlands, 07–10
Jul 2017. PMLR. [269]
A. Rakhlin, O. Shamir, and K. Sridharan. Making gradient descent optimal for
strongly convex stochastic optimization. InProceedings of the 29th International
Conference on Machine Learning (ICML), 2012. [389]
L. M. Rios and N. V. Sahinidis. Derivative-free optimization: a review of
algorithms and comparison of software implementations. Journal of Global
Optimization, 56(3):1247–1293, Jul 2013. [388]
H. Robbins. Some aspects of the sequential design of experiments.Bulletin of
the American Mathematical Society, 58(5):527–535, 1952. [15, 66, 91]
H. Robbins and D. Siegmund. Boundary crossing probabilities for the wiener
process and sample sums.The Annals of Mathematical Statistics, pages 1410–
1429, 1970. [245]

Free download pdf