Bandit Algorithms

(Jeff_L) #1
15.4 Bibliographic remarks 193

An estimator is a functionθˆ:Xn→Θ. Le Cam’s method is used for proving
minimax lower bounds on the expected error of the estimator, which is

inf
θˆ

sup
P∈P

EX 1 ,...,Xn∼Pn

[


d(θˆ(X 1 ,...,Xn),θ(P))

]


. (15.3)


The idea is to chooseP 0 ,P 1 ∈Pto maximized(θ(P 0 ),θ(P 1 ))exp(−nD(P 0 ,P 1 )),
on the basis that for anyP 0 ,P 1 ∈P,

Eq.(15.3)≥∆
8

exp (−nD(P 0 ,P 1 )), (15.4)

where ∆ =d(θ(P 0 ),θ(P 1 )). There are two differences compared to the bandit
lower bound:(i)we deal with the sequential setting and(ii)having chosenP 0
we chooseP 1 in a way that depends on the algorithm. This provides a much
needed extra boost, without which the method would be unable to capture
how the characteristics ofPare reflected in the minimax risk (or regret, in our
case).

15.4 Bibliographic remarks


The first work on lower bounds that we know of was the remarkably precise
minimax analysis of two-armed Bernoulli bandits by Vogel [1960]. The high
probability Pinsker inequality (Theorem 14.2) was first used for bandits by
Bubeck et al. [2013b]. As mentioned in the notes, the use of this inequality for
proving lower bounds is known as Le Cam’s method in statistics [Le Cam, 1973].
As far as we can tell, the earliest proof of the high probability Pinsker inequality
is due to Bretagnolle and Huber [1979], but we also recommend the book by
Tsybakov [2008]. The proof of Theorem 15.2 uses the same ideas as Gerchinovitz
and Lattimore [2016], while the alternative proof in Exercise 15.2 is essentially
due to Auer et al. [1995], who analyzed the more difficult case where the rewards
are Bernoulli (see Exercise 15.4). Yu [1997] describes some alternatives to Le
Cam’s method for the passive, statistical setting. These alternatives can be (and
often are) adapted to the sequential setting.

15.5 Exercises


15.1(Le Cam’s method) Establish the claim in Eq. (15.4).

15.2 (Alternative proof of Theorem 15.2) Here you will prove
Theorem 15.2 with a different method. Letc >0 and ∆ = 2c


k/nand for
eachi∈{ 0 , 1 ,...,k}letμ(i)∈Rksatisfyμ(ji)=I{i=j}∆. Further abbreviate
the notation in the proof of Theorem 15.2 by lettingEi[·] =Eμ(i)[·].
Free download pdf