Bandit Algorithms

(Jeff_L) #1

INDEX 548


concave, 292
conditional entropy, 438
conditional expectation, 32
conditional independence, 35
conditional probability, 27
conjugate pair, 399
conjugate prior, 399
consistent policy, 197, 280
contextual bandit, 67, 213, 358
adversarial linear, 332
stochastic, 221, 224,227–229
stochastic linear,229–233, 332
controlled Markov environment, 502
convex hull, 290
convex optimization, 309
counting measure, 39
covering, 239, 246
Cramer–Chernoff method,74–76, 161,
241, 242
cumulant generating function, 74 , 136
cumulative distribution function, 32


D-optimal design, 252
degenerate action, 453
derivative-free stochastic optimization,
388
descriptive theory, 65
deviation matrix, 504
diameter
of convex set, 314
of MDP, 479
differential value function, 481
discount factor, 358
discounting, 358, 418, 425
disintegration theorem, 52 , 407, 426
distribution, 20
domain, 290
dominated action, 453
dominating measure, 38
doubling trick, 90 , 93, 154, 270, 320
dual norm, 315
dynamic programming, 424


easy partial monitoring problem, 452
empirical process, 83


empirical risk minimization, 222
entropy, 177, 178, 184
Exp3, 147 , 214, 302, 304, 309, 323, 341,
353, 355, 451
Exp3-IX, 159 , 204, 341, 360
Exp3.P, 165 , 223, 360
Exp3.S, 360
Exp4, 218 , 236, 354
expectation, 29
explore-then-commit, 87 , 105, 221, 235
exponential family, 116, 135, 136, 201,
202, 380, 401 , 442
exponential weighting, 147, 345
algorithm, 152
continuous, 304–306
extended real line, 290
feasible, 483
feature map, 228
feature space, 228
feature vector, 228
feedback matrix, 450
Fenchel dual, 80, 291
filtered probability space, 27
filtration, 27
finite additivity, 20
first order bound, 164, 320, 325
first-order optimality condition, 296
Fisher information, 192
Fixed Share, 360
follow the leader, 310, 321
follow the perturbed leader, 156, 221,
345
follow the regularized leader, 310 , 333,
334
changing potentials, 324
Frank-Wolfe algorithm, 254
Fubini’s theorem, 39
full information, 152, 360
fundamental matrix, 504
G-optimal design, 252
gain, 480
game theory, 175
Gaussian tail lower bound, 446
Free download pdf