Bandit Algorithms

INDEX 548

concave, 292
conditional entropy, 438
conditional expectation, 32
conditional independence, 35
conditional probability, 27
conjugate pair, 399
conjugate prior, 399
consistent policy, 197, 280
contextual bandit, 67, 213, 358
adversarial linear, 332
stochastic, 221, 224,227–229
stochastic linear,229–233, 332
controlled Markov environment, 502
convex hull, 290
convex optimization, 309
counting measure, 39
covering, 239, 246
Cramer–Chernoff method,74–76, 161,
241, 242
cumulant generating function, 74 , 136
cumulative distribution function, 32

D-optimal design, 252
degenerate action, 453
derivative-free stochastic optimization,
388
descriptive theory, 65
deviation matrix, 504
diameter
of convex set, 314
of MDP, 479
differential value function, 481
discount factor, 358
discounting, 358, 418, 425
disintegration theorem, 52 , 407, 426
distribution, 20
domain, 290
dominated action, 453
dominating measure, 38
doubling trick, 90 , 93, 154, 270, 320
dual norm, 315
dynamic programming, 424

easy partial monitoring problem, 452
empirical process, 83

empirical risk minimization, 222 entropy, 177, 178, 184 Exp3, 147 , 214, 302, 304, 309, 323, 341, 353, 355, 451 Exp3-IX, 159 , 204, 341, 360 Exp3.P, 165 , 223, 360 Exp3.S, 360 Exp4, 218 , 236, 354 expectation, 29 explore-then-commit, 87 , 105, 221, 235 exponential family, 116, 135, 136, 201, 202, 380, 401 , 442 exponential weighting, 147, 345 algorithm, 152 continuous, 304–306 extended real line, 290 feasible, 483 feature map, 228 feature space, 228 feature vector, 228 feedback matrix, 450 Fenchel dual, 80, 291 filtered probability space, 27 filtration, 27 finite additivity, 20 first order bound, 164, 320, 325 first-order optimality condition, 296 Fisher information, 192 Fixed Share, 360 follow the leader, 310, 321 follow the perturbed leader, 156, 221, 345 follow the regularized leader, 310 , 333, 334 changing potentials, 324 Frank-Wolfe algorithm, 254 Fubini’s theorem, 39 full information, 152, 360 fundamental matrix, 504 G-optimal design, 252 gain, 480 game theory, 175 Gaussian tail lower bound, 446

Bandit Algorithms

INDEX 548

Get our desktop app

Company

Features

Documentation

Resources