Bandit Algorithms

(Jeff_L) #1
33.6 Exercises 390

results in Section 33.2. Glynn and Juneja [2015] gives further pointers to this
literature, while connecting it to the bandit literature.
Best-arm identification has also been considered in the adversarial setting
[Jamieson and Talwalkar, 2016, Li et al., 2018a, Abbasi-Yadkori et al., 2018].

33.6 Exercises


33.1(Simple regret lower bound) Show there exists a universal constant
C >0 such that for alln≥k >1 and all policiesπthere exists aν∈ Esuch
thatRsimplen (π,ν)≥C


k/n.

33.2(Suboptimality of uniform exploration) Show there exists a universal
constantC >0 such that for alln≥k >1 there exists aν∈ Esuch that
Rsimplen (UE,ν)≥C


klog(k)/n.

33.3 LetL >0 andD⊂[0,∞)k\{ 0 }be nonempty. Show that

inf

{


‖α‖ 1 :α∈[0,∞)k,inf
d∈D

〈α,d〉≥L

}


=


(


sup
α∈Pk− 1

inf
d∈D

〈α,d〉

)− 1


L.


33.4(Best-arm identification for Gaussian bandits) Letσ 12 ,...,σ^2 kbe
fixed andE={(N(μi,σ^2 i))ki=1:μ∈Rk}be the set of Gaussian bandits with given
variances. Letν∈ Ebe a bandit withμ 1 (ν)> μi(ν) for alli >1. Abbreviate
μ=μ(ν) and ∆ = ∆(ν).

(a) For anyα∈[0,∞)kshow that

inf
̃ν∈Ealt(ν)

∑k

i=1

αiD(νi,ν ̃i) =

1


2 mini> 1

α 1 αi∆^2 i
α 1 σ^2 i+αiσ 12.

(b) Show that ifk= 2, thenc∗(ν) = 2(σ 1 +σ 2 )^2 /∆^22.

(c) Show thatc∗(ν)≥

2 σ 12
∆min

+


∑k

i=2

2 σ^2 i
∆^2 i

.


(d) Show that

c∗(ν)≤

2 σ 12
∆min

+


∑k

i=2

2 σ^2 i
∆^2 i

+ 2


√√


√√ 2 σ 12
∆^2 min

∑k

i=2

2 σi^2
∆^2 i

. (33.9)


(e) Show that ifσ^2 i/∆^2 i=σ^21 /∆^2 minfor alli, then equality holds in Eq. (33.9).

33.5(Probably approximately correct algorithms) This exercise is about
designing (ε,δ)-PAC algorithms.
Free download pdf