Bandit Algorithms

30.4 Follow the perturbed leader 346

All this shows that FTPL can be interpreted as mirror descent with potentialF defined in terms of its Fenchel dual, F∗(x) =E[φ(x+Z)]. (30.4) There are more reasons for making this connection than mere curiosity. The classical analysis of FTPL is highly probabilistic and involves at least one ‘leap of faith’ in the analysis. In contrast, the analysis via the mirror descent interpretation is more mechanical. Recall that mirror descent depends on choosing a potential, an exploration distribution and an estimator. The exploration distribution is a distributionPtonAsuch that A ̄t=

∑

a∈A

Pt(a)a,

which in our case is simply defined by Pt(a) =P(a(Zt−ηLˆt− 1 ) =a|Ft− 1 ). It remains to choose the loss estimator. A natural choice would be the same as Eq. (30.1), which isYˆti=Atiyti/PtiwithPti=P(Ati= 1|Ft− 1 ). The problem is thatPtidoes not generally have a closed form solution. And while it can be estimated by sampling, the number of samples required for sufficient accuracy can be quite large. The idea is to replace 1/Ptiin the importance weighted estimator with a random variable with conditional expectation equal to 1/Pti. Lemma30.3.LetU∈N+be geometrically distributed with parameterθ∈[0,1] so thatP(U=j) = (1−θ)j−^1 θ. ThenE[U] = 1/θ. For eachtletKt 1 ,...,Ktdbe a sequence of geometric random variables that are conditionally independent givenFtand with P(Kti=j|Ft) = (1−Pti)j−^1 Pti. Then define the estimator by ofytiby Yˆti= min{β,KtiAityit}.

The truncation parameterβis needed to ensure thatYˆtiis never too large, but note that without it we would have E[KtiAityit|Ft− 1 ] =yit. We have now provided all the pieces to define mirror descent. The algorithm is summarized in Algorithm 18. Theorem30.4. LetQhave density with respect to the Lebesgue measure of q(z) = 2−dexp(−‖z‖ 1 )and

η=

√

2(1 + log(d)) (1 +e)dnm

β=

1

ηm

.

Then the regret of Algorithm 18 is bounded byRn≤m

√

2(1 +e)nd(1 + log(d)).

Bandit Algorithms

∑

√

1

.

√

Get our desktop app

Company

Features

Documentation

Resources