Bandit Algorithms

(Jeff_L) #1
33.2 Best-arm identification with a fixed confidence 381

(c) WhenE=E^2 N(1) andν∈Ehas a unique optimal arm, then

c∗(ν)−^1 =

1


2


sup
α∈[0,1]

inf
ν′∈Ealt(ν)

{


α(μ 1 (ν)−μ 1 (ν′))^2 + (1−α)(μ 2 (ν)−μ 2 (ν′))^2

}


=


1


(^2) αsup∈[0,1]α(1−α) (μ^1 (ν)−μ^2 (ν))
(^2) =^1
8 (μ^1 (ν)−μ^2 (ν))
(^2).
In this case we observe thatα∗ 1 (ν) =α∗ 2 (ν) = 1/2.
(d)Suppose thatσ^2 ∈(0,∞)kis fixed andE={(N(μi,σi^2 )ki=1:μ∈Rk}. You
are asked in Exercise 33.4 to verify that whenk= 2,
c∗(ν) =
2(σ 1 +σ 2 )^2
∆^22


. (33.7)


which unsurprisingly shows the problem becomes harder as the variance of
either of the arms increases. In Exercise 33.4 you will show whenk≥2 it
holds that

2 σi^2 ∗
∆^2 min+


i 6 =i∗

2 σi^2
∆^2 i ≤c

∗(ν)≤^2 σ

2
i∗
∆^2 min+


i 6 =i∗

2 σi^2
∆^2 i + 2

√√


√√ 2 σ^2 i∗
∆^2 min


i 6 =i∗

2 σi^2
∆^2 i ,

where ∆min=mini 6 =i∗∆iis the smallest suboptimality gap. This bound
faithfully captures the intuition that each suboptimal arm must be played
sufficiently often to be distinguished from the optimal arm, while the optimal
arm must be observed sufficiently many times so that it can be distinguished
from the second best arm. Fork= 2 this bound is smaller than the value of
c∗(ν) as shown in(33.7), showing that there is room for improvement in this
case.

33.2.2 Policy, stopping/selection rule and upper bounds


The bounds in the previous section are tight for many environment classesE. For
simplicity we focus on the Gaussian case.

For this section we assume thatE=ENk(1) is the set ofk-armed Gaussian
bandits with unit variance.

We need to construct a pair (π,τ) that is sound forEand for whichEνπ[τ]
matches the lower bound in Theorem 33.5 asδ→0. Both are derived using
the insights provided by the lower bound. The policy should choose actioniin
proportion toα∗i(ν), which must be estimated from data. The stopping rule is
motivated by noting that Eq. (33.5) implies that a sound stopping rule must
satisfy
∑k

i=1

Eνπ[Ti(τ)] D(νi,νi′)≥log

(


4


δ

)


for allν′∈Ealt(ν).
Free download pdf