Bandit Algorithms

(Jeff_L) #1

INDEX 549


generalized linear model, 234
Gittins index, 360, 417
globally observable, 455
gradient descent, 311


Hahn decomposition, 30
hard partial monitoring problem, 452
Hardy–Littlewood, 421, 428
heavy tailed, 74
Hedge, 152
Hellinger distance, 184
Hoeffding’s inequality, 77 , 83, 130, 372
Hoeffding’s lemma, 77 , 80, 129, 152,
247
Hoeffding–Azuma, 247 , 466, 493
hopeless partial monitoring problem,
452
Huffman coding, 178
hypothesis space, 397


image, 470
implicitly normalized forecaster, 326
importance-weighted estimator, 145,
146, 163, 221, 318
independent events, 28
index, 417
index policy, 417
indicator function, 22
information directed sampling, 444 ,
446
instance-dependent bound, 222
integrable, 30
interior, 297
Ionescu–Tulcea theorem, 47 , 64, 403,
478
isomorphic measurable spaces, 45


Jensen’s inequality, 292
John’s ellipsoid, 307
joint distribution, 22
Jordan–Brouwer separation theorem,
464, 470


kernel, 470
kernel trick, 233
Kiefer–Wolfowitz, 252 , 259, 304, 307,
317, 333


Kraft’s inequality, 182
Kullback-Leibler divergence, 177
Laplace’s method, 242
law, 20
law of the iterated logarithm, 107, 119,
248
lazy mirror descent, 321
Le Cam’s inequality, 182
Le Cam’s method, 193
learning rate, 147
adaptive, 320, 325
time-varying, 220, 311, 324, 325
least-squares, 238
Lebesgue integral, 29
Lebesgue measure, 31
Legendre function,294–296, 310, 345
light tailed, 74
likelihood ratio, 250
linear subspace, 469
link function, 235
locally observable, 455
log partition function, 136, 400
log-concave, 305
loss matrix, 450
margin, 237
Markov chain,47–48, 461, 478
Markov kernel, 47
Markov policy, 478
Markov process, 51
Markov property, 497
Markov reward process, 418
martingale, 48
maximal end-component, 511
maximal inequality, 50 , 119, 245
maximum end-component, 512
measurable set, 20
measurable space, 20
measure, 20
median-of-means, 110
memoryless deterministic policy, 478
memoryless policy, 478
metric entropy, 247
minimax, 118
Free download pdf