Bandit Algorithms
37.2 The structure of partial monitoring 452 Example37.6 (Dynamic pricing).A charity worker is going door-to-door selling calend ...
37.2 The structure of partial monitoring 453 vectors (ut)nt=1byut=eit. No information has been lost in this transformation from ...
37.2 The structure of partial monitoring 454 0 1 〈` 3 ,(u, 1 −u)〉 〈` 2 ,(u, (^1) − u)〉 〈^1 ,(u, 1 − u)〉 u Figure 37.1The figure ...
37.2 The structure of partial monitoring 455 optimal actions, and on the other hand there exist games for which〈`a,ut〉cannot be ...
37.3 Classification of finite adversarial partial monitoring 456 of their cells is (d−3)-dimensional. Finally,N 3 ={ 1 , 2 , 3 } ...
37.4 Lower bounds 457 chosen independently at random from the same distribution. To emphasize the randomness we switch to capita ...
37.4 Lower bounds 458 In this form it does not seem obvious what the next step should be. To clear things up we introduce some l ...
37.4 Lower bounds 459 whereCuis a suitably large constant. We note thatu∈Ca∩Cbis not on the boundary ofPd− 1 , soui>0 for all ...
37.5 Policy for easy games 460 C 1 u C 2 u 1 u 2 C 3 Figure 37.3Lower bound construction for hard partial monitoring problems Th ...
37.5 Policy for easy games 461 At∈ ⋃ a,bNab, where the union is over pairs of neighboring actions. Removing degenerate actions c ...
37.5 Policy for easy games 462 Step 3 (Redistribution) NowP ̃tis rebalanced to a new distributionPtfor which duplicate and degen ...
37.5 Policy for easy games 463 whereZˆtjais an unbiased estimator ofyta−ytjandβtjais a bias term: Zˆtja= P ̃tjvaj(At,Φt) PtAt an ...
37.6 Upper bound for easy games 464 Ca Cb C ̃a v u w Figure 37.4The construction used in the proof of Lemma 37.17. Suppose that ...
37.6 Upper bound for easy games 465 twice that〈`a−`b,w〉= 0, we calculate 〈`a−`b,u〉=〈`a−`b,u−w〉= ‖u−w‖ 2 ‖v−w‖ 2 〈`a−`b,w−v〉 = ‖u ...
37.6 Upper bound for easy games 466 1 −δ, Rˆn≤Rˆ′n+γn+√ 2 nlog(1/δ). (37.15) By Lemma 37.18, the surrogate regret is bounded in ...
37.6 Upper bound for easy games 467 By part (d) of Lemma 37.14 we havePtb≥γ/kfor alltandb∈[k]. Furthermore, |vaj(At,Φt)|≤Vso tha ...
37.7 Proof of the classification theorem 468 where the first equality used that ∑ aQtja= 1, the second to last inequality follow ...
37.8 Notes 469 thenR∗n(G) = Ω(n). All that remains is to prove that(a)ifGhas no neighboring actions, thenR∗n(G) = 0 and(b)ifGis ...
37.8 Notes 470 matrix multiplication,A(x) =Ax. TheimageofAisim(A) ={Ax:x∈Rn} and thekernelisker(A) ={x∈Rn:Ax= 0}. Notice thatim( ...
37.8 Notes 471 boundary of both regions, which happens to be the boundary ofCa, showing that the intersection of the line segmen ...
«
19
20
21
22
23
24
25
26
27
28
»
Free download pdf