Bandit Algorithms

(Jeff_L) #1
30.4 Follow the perturbed leader 346

All this shows that FTPL can be interpreted as mirror descent with potentialF
defined in terms of its Fenchel dual,
F∗(x) =E[φ(x+Z)]. (30.4)
There are more reasons for making this connection than mere curiosity. The
classical analysis of FTPL is highly probabilistic and involves at least one ‘leap of
faith’ in the analysis. In contrast, the analysis via the mirror descent interpretation
is more mechanical. Recall that mirror descent depends on choosing a potential,
an exploration distribution and an estimator. The exploration distribution is a
distributionPtonAsuch that
A ̄t=


a∈A

Pt(a)a,

which in our case is simply defined by
Pt(a) =P(a(Zt−ηLˆt− 1 ) =a|Ft− 1 ).
It remains to choose the loss estimator. A natural choice would be the same as
Eq. (30.1), which isYˆti=Atiyti/PtiwithPti=P(Ati= 1|Ft− 1 ). The problem
is thatPtidoes not generally have a closed form solution. And while it can be
estimated by sampling, the number of samples required for sufficient accuracy can
be quite large. The idea is to replace 1/Ptiin the importance weighted estimator
with a random variable with conditional expectation equal to 1/Pti.
Lemma30.3.LetU∈N+be geometrically distributed with parameterθ∈[0,1]
so thatP(U=j) = (1−θ)j−^1 θ. ThenE[U] = 1/θ.
For eachtletKt 1 ,...,Ktdbe a sequence of geometric random variables that
are conditionally independent givenFtand with
P(Kti=j|Ft) = (1−Pti)j−^1 Pti.
Then define the estimator by ofytiby
Yˆti= min{β,KtiAityit}.

The truncation parameterβis needed to ensure thatYˆtiis never too large, but
note that without it we would have
E[KtiAityit|Ft− 1 ] =yit.
We have now provided all the pieces to define mirror descent. The algorithm is
summarized in Algorithm 18.
Theorem30.4. LetQhave density with respect to the Lebesgue measure of
q(z) = 2−dexp(−‖z‖ 1 )and

η=


2(1 + log(d))
(1 +e)dnm

β=

1


ηm

.


Then the regret of Algorithm 18 is bounded byRn≤m



2(1 +e)nd(1 + log(d)).
Free download pdf