Bandit Algorithms

(Jeff_L) #1

INDEX 550


minimax optimal, 172
mirror descent, 152, 310 , 345, 355
model, 397
MOSS, 116, 118
multiclass classification with bandit
feedback, 223
multitask bandit, 275, 342
mutual information, 438


nats, 178
neighboring actions, 453
non-singular exponential family, 136
nonoblivious, 321
nonoblivious adversary, 152
nonparametric, 56
nonstationary, 153
nonstationary bandit, 66
null set, 34


oblivious, 316, 321
oblivious adversary, 152
online gradient descent, 311
online learning, 15, 265, 309
online linear optimization, 309
online-to-confidence set conversion, 265
open set, 298
operator, 40
optimal experimental design, 252, 386
optimal value function, 481
optimism bias, 106
optimization oracle, 221, 349
optional stopping theorem, 49
ordinal optimization, 389
orthogonal complement, 469
outcome space, 18


packing, 246
parameter noise, 330
parametric, 56
Pareto optimal, 176, 394
Pareto optimal action, 453
partially observable Markov decision
process, 497
peeling device, 119
permutation, 362


Pinsker’s inequality, 128, 135, 180, 183 ,
184, 199, 315
point-locally observable, 471
policy, 62
policy iteration, 499
policy schema, 406
position-based model, 363
posterior, 396
potential function, 310
predictable, 27
prediction with expert advice, 152
preimage, 19
prescriptive theory, 65
prior, 394, 397
prior variance, 399
probability distribution, 20
probability kernel, 47 , 397
probability measure, 20
probability space, 20
product kernel, 47
product measure, 39, 63
projective, 46
pushforward, 20
quadratic variation, 164
Radon-Nikodym derivative, 38
random variable, 18
ranked bandit model, 373
ranking and selection, 389
reactive adversary, 152
reduction, 329, 377
regret, 9
adversarial, 143
nonstationary, 354
policy, 153
pseudo, 66
pseudo, random, 205
random, 66
stochastic, 58
tracking, 354
regret decomposition lemma, 60
regular exponential family, 136
regular version, 51 , 186
regularizer, 310
Free download pdf