Bandit Algorithms

(Jeff_L) #1

INDEX 551


reinforcement learning, 12, 91, 425, 487
relative entropy, 137,179–182, 293
restless bandit, 360, 427
reward stack, 63, 421
ridge regression, 238
right stochastic matrix, 461 , 476
semibandit, 340,342–349, 373, 375,
445
separation oracle, 305, 485
sequential halving, 385
signal variance, 399
signed measure, 20
similarity function, 216
simple function, 30
Sion’s minimax theorem, 322 , 327, 406,
443
sliding window, 359
smoothness, 228
source coding theorem, 178
span, 481
state space, 476
static experts, 221
stationary transition matrix, 481, 504
stochastic optimization, 386
stochastic partial monitoring, 469
stochastic process, 46
stopping rule, 49
stopping time, 49
strictly convex, 291
strongly connected component, 511
sub-σ-algebra, 20
submartingale, 48 , 119
suboptimality gap, 60
sufficient statistic, 420, 426 , 497
of exponential family, 136, 400
supermartingale, 48 , 245
supervised learning, 216
support, 40
support function, 345, 351
supporting hyperplane, 296
theorem, 298 , 409

Thompson sampling, 66, 116, 429
for reinforcement learning, 501


total variation distance, 183 , 184
Track-and-Stop algorithm, 382
transductive learning, 223
transition matrix, 480
trivial event, 42
trivial partial monitoring problem, 452
uniform exploration algorithm, 376
union bound, 75
unnormalized negentropy, 293 , 295,
299, 312, 323, 343, 355
unrealizable, 276
unstructured bandits, 197
vanishing discount approach, 498
Varaiya’s algorithm, 426
von Neumann-Morgenstern theorem, 65
Wald–Bellman equation, 413
weak neighbor, 471
weak* topology, 40
worst-case regret, 172
zeroth-order stochastic optimization,
388
Free download pdf