Bandit Algorithms

INDEX 551

reinforcement learning, 12, 91, 425, 487 relative entropy, 137,179–182, 293 restless bandit, 360, 427 reward stack, 63, 421 ridge regression, 238 right stochastic matrix, 461 , 476 semibandit, 340,342–349, 373, 375, 445 separation oracle, 305, 485 sequential halving, 385 signal variance, 399 signed measure, 20 similarity function, 216 simple function, 30 Sion’s minimax theorem, 322 , 327, 406, 443 sliding window, 359 smoothness, 228 source coding theorem, 178 span, 481 state space, 476 static experts, 221 stationary transition matrix, 481, 504 stochastic optimization, 386 stochastic partial monitoring, 469 stochastic process, 46 stopping rule, 49 stopping time, 49 strictly convex, 291 strongly connected component, 511 sub-σ-algebra, 20 submartingale, 48 , 119 suboptimality gap, 60 sufficient statistic, 420, 426 , 497 of exponential family, 136, 400 supermartingale, 48 , 245 supervised learning, 216 support, 40 support function, 345, 351 supporting hyperplane, 296 theorem, 298 , 409

Thompson sampling, 66, 116, 429
for reinforcement learning, 501

total variation distance, 183 , 184 Track-and-Stop algorithm, 382 transductive learning, 223 transition matrix, 480 trivial event, 42 trivial partial monitoring problem, 452 uniform exploration algorithm, 376 union bound, 75 unnormalized negentropy, 293 , 295, 299, 312, 323, 343, 355 unrealizable, 276 unstructured bandits, 197 vanishing discount approach, 498 Varaiya’s algorithm, 426 von Neumann-Morgenstern theorem, 65 Wald–Bellman equation, 413 weak neighbor, 471 weak* topology, 40 worst-case regret, 172 zeroth-order stochastic optimization, 388

Bandit Algorithms

INDEX 551

Get our desktop app

Company

Features

Documentation

Resources