Bandit Algorithms

INDEX 550

minimax optimal, 172
mirror descent, 152, 310 , 345, 355
model, 397
MOSS, 116, 118
multiclass classification with bandit
feedback, 223
multitask bandit, 275, 342
mutual information, 438

nats, 178
neighboring actions, 453
non-singular exponential family, 136
nonoblivious, 321
nonoblivious adversary, 152
nonparametric, 56
nonstationary, 153
nonstationary bandit, 66
null set, 34

oblivious, 316, 321
oblivious adversary, 152
online gradient descent, 311
online learning, 15, 265, 309
online linear optimization, 309
online-to-confidence set conversion, 265
open set, 298
operator, 40
optimal experimental design, 252, 386
optimal value function, 481
optimism bias, 106
optimization oracle, 221, 349
optional stopping theorem, 49
ordinal optimization, 389
orthogonal complement, 469
outcome space, 18

packing, 246
parameter noise, 330
parametric, 56
Pareto optimal, 176, 394
Pareto optimal action, 453
partially observable Markov decision
process, 497
peeling device, 119
permutation, 362

Pinsker’s inequality, 128, 135, 180, 183 , 184, 199, 315 point-locally observable, 471 policy, 62 policy iteration, 499 policy schema, 406 position-based model, 363 posterior, 396 potential function, 310 predictable, 27 prediction with expert advice, 152 preimage, 19 prescriptive theory, 65 prior, 394, 397 prior variance, 399 probability distribution, 20 probability kernel, 47 , 397 probability measure, 20 probability space, 20 product kernel, 47 product measure, 39, 63 projective, 46 pushforward, 20 quadratic variation, 164 Radon-Nikodym derivative, 38 random variable, 18 ranked bandit model, 373 ranking and selection, 389 reactive adversary, 152 reduction, 329, 377 regret, 9 adversarial, 143 nonstationary, 354 policy, 153 pseudo, 66 pseudo, random, 205 random, 66 stochastic, 58 tracking, 354 regret decomposition lemma, 60 regular exponential family, 136 regular version, 51 , 186 regularizer, 310

Bandit Algorithms

INDEX 550

Get our desktop app

Company

Features

Documentation

Resources