BIBLIOGRAPHY 519
D. Bernoulli. Exposition of a new theory on the measurement of risk.
Econometrica: Journal of the Econometric Society, pages 23–36, 1954. [65]
A. C. Berry. The accuracy of the gaussian approximation to the sum of
independent variates. Transactions of the american mathematical society,
49(1):122–136, 1941. [76]
D. Berry and B. Fristedt.Bandit problems : sequential al location of experiments.
Chapman and Hall, London ; New York :, 1985. [15, 425, 426]
D. Bertsekas and J. N. Tsitsiklis.Neuro-Dynamic Programming. Athena Scientific,
1st edition, 1996. [500]
D. P. Bertsekas.Dynamic Programming and Optimal Control, volume 1-2. Athena
Scientific, Belmont, MA, 4 edition, 2012. [499, 500]
D. Bertsimas and J. N. Tsitsiklis.Introduction to linear optimization, volume 6.
Athena Scientific Belmont, MA, 1997. [500]
O. Besbes, Y. Gur, and A. Zeevi. Stochastic multi-armed-bandit problem with
non-stationary rewards. In Z. Ghahramani, M. Welling, C. Cortes, N. D.
Lawrence, and K. Q. Weinberger, editors,Advances in Neural Information
Processing Systems 28, NIPS, pages 199–207. Curran Associates, Inc., 2014.
[360, 361]
A. Beygelzimer, J. Langford, L. Li, L. Reyzin, and R. E. Schapire. An optimal
high probability algorithm for the contextual bandit problem. arXiv, 2010.
[164]
A. Beygelzimer, J. Langford, L. Li, L. Reyzin, and R. Schapire. Contextual bandit
algorithms with supervised learning guarantees. In G. Gordon, D. Dunson,
and M. Dud ́ık, editors,Proceedings of the 14th International Conference on
Artificial Intelligence and Statistics, volume 15 ofProceedings of Machine
Learning Research, pages 19–26, Fort Lauderdale, FL, USA, 11–13 Apr 2011.
PMLR. [223, 226]
P. Billingsley.Probability and measure. John Wiley & Sons, 2008. [40, 52]
D. Blackwell. Controlled random walks. InProceedings of the international
congress of mathematicians, volume 3, pages 336–338, 1954. [142]
L. Bottou, J. Peters, J. Qui ̃nonero-Candela, D. X. Charles, D. M. Chickering,
E. Portugaly, D. Ray, P. Simard, and E. Snelson. Counterfactual reasoning
and learning systems: The example of computational advertising.The Journal
of Machine Learning Research, 14(1):3207–3260, 2013. [165]
S. Boucheron, G. Lugosi, and P. Massart. Concentration inequalities: A
nonasymptotic theory of independence. OUP Oxford, 2013. [78, 321]
G. E. P. Box. Science and statistics. Journal of the American Statistical
Association, 71(356):791–799, 1976. [142]
G. E. P. Box. Robustness in the strategy of scientific model building.Robustness
in statistics, 1:201–236, 1979. [142]
S. Boyd and L. Vandenberghe.Convex optimization. Cambridge university press,
- [298]