Bandit Algorithms
21.1 The Kiefer–Wolfowitz theorem 252 In the subfield of statistics called optimal experimental design, the distribution πis cal ...
21.1 The Kiefer–Wolfowitz theorem 253 (b)⇒(a): Suppose thatπ∗is a maximizer off. By the first-order optimality criterion (see Se ...
21.2 Notes 254 Geometric interpretation There is a geometric interpretation of theD-optimal design problem. Letπbe a D-optimal d ...
21.3 Bibliographic remarks 255 whereak=argmaxa∈A‖a‖^2 V(πk)− 1 and the stepsize is chosen to optimizef along the line connecting ...
21.4 Exercises 256 21.4 Exercises 21.1(Derivative of log determinant) Prove the correctness of the derivative in Eq. (21.4). Hin ...
This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This p ...
22.1 Notes 258 Input A⊂Rdandδ Step 0Set`= 1 and letA 1 =A Step 1Lett`=tbe the current timestep and findG-optimal designπ`∈P(A`) ...
22.2 Bibliographic remarks 259 22.2 Bibliographic remarks The algorithms achieving Eq. (22.1) for changing action sets are SupLi ...
This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This p ...
23.2 Elimination on the hypercube 261 Assumption23.1.The following hold: (a)(Sparse parameter)There exist known constants m 0 an ...
23.2 Elimination on the hypercube 262 So learning the optimal action amounts to learning the sign ofθifor each dimension. A disa ...
23.2 Elimination on the hypercube 263 1:Input nandd 2:SetE 1 i= 1 andC 1 i=Rfor alli∈[d] 3:fort= 1,...,ndo 4: For eachi∈[d] samp ...
23.2 Elimination on the hypercube 264 Clearly ifθi= 0, thenRni= 0. And so it suffices to boundRnifor eachiwith |θi|>0. Suppos ...
23.3 Online to confidence set conversion 265 to last inequality follows becauseAtiis conditionally Rademacher fort≤τi, which is ...
23.3 Online to confidence set conversion 266 Θ ={x:‖x‖ 0 ≤m 0 }, but first we show how to use such a learning algorithm to const ...
23.3 Online to confidence set conversion 267 the following holds for allt: Qt= ∑t s=1 (Xˆs−〈As,θ∗〉)^2 ≤βt(δ). We could defineCt+ ...
23.4 Sparse online linear prediction 268 one can see that ‖θ‖^22 + ∑t s=1 (Xˆs−〈As,θ〉)^2 =‖θ−θˆt‖^2 Vt+ ∑t s=1 (Xˆs−〈θˆt,As〉)^2 ...
23.5 Notes 269 23.5 Notes 1 The strategy achieving the bound in Theorem 23.6 is not computationally efficient. In fact we do not ...
23.7 Exercises 270 23.7 Exercises 23.1(The zero-‘norm’) A norm onRdis a function‖·‖:Rd→Rsuch that for alla∈Randx,y∈Rdit holds th ...
This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This p ...
«
9
10
11
12
13
14
15
16
17
18
»
Free download pdf