1.3 Bibliographic remarks 16
mobile health: In the typical application the user is prompted with the intention
of inducing a long-term beneficial behavioral change. See also the article by
Greenewald et al. [2017]. Rafferty et al. [2018] apply Thompson sampling to
educational software and note the tradeoff between knowledge and reward. That
bandit algorithms have not been used in clinical trials was explicitly noted by
Villar et al. [2015]. Microsoft offers a ‘Decision Service’ that uses bandit algorithms
to automate decision-making [Agarwal et al., 2016]. We already mentioned that
bandit algorithms are a cornerstone of Monte-Carlo Tree Search [Kocsis and
Szepesv ́ari, 2006].