Bandit Algorithms

28.2 Regret analysis 315

for allt. Then mirror descent with the unnormalized negentropy potential and η=

√

2 log(d)/nsatisfiesRn≤

√

2 nlog(d).

Proof The Bregman divergence with respect to the unnormalized negentropy potential fora,b∈AisD(a,b) =

∑d i=1ailog(ai/bi). Therefore

Rn(a)≤F(a)−F(a^1 ) η

+

∑n

t=1

〈at−at+1,yt〉−^1 η

∑n

t=1

D(at+1,at)

≤log(d) η

+

∑n

t=1

‖at−at+1‖ 1 ‖yt‖∞−^1 η

∑n

t=1

1

2

‖at−at+1‖^21

≤log(d) η

+η 2

∑n

t=1

‖yt‖^2 ∞≤log(d) η

+ηn 2

=

√

2 nlog(d),

where the first inequality follows from Theorem 28.4, the second from Pinsker’s
inequality and the facts thatdiamF(A) =log(d). In the third inequality we used
that fact thatax−bx^2 / 2 ≤a^2 /(2b) for allx. The last inequality follows from
the assumption that‖yt‖∞≤1.

The last few steps in the above proof are so routine that we summarize their use in a corollary, the proof of which we leave to the reader (Exercise 28.4).

Corollary28.8.LetFbe a Legendre potential and‖·‖tbe a norm onRdfor
eacht∈[n]such thatDF(at+1,at)≥^12 ‖at+1−at‖^2 t. Then the regret of mirror
descent or fol low the regularized leader satisfies

Rn≤diamF(A) η

+η 2

∑n

t=1

‖yt‖^2 t∗,

where‖y‖t∗= maxx:‖x‖t≤ 1 〈x,y〉is the dual norm of‖·‖t.

It often happens that the easiest way to bound the regret of mirror descent
is to find a norm that satisfies the conditions of Corollary 28.8. Very often
Theorem 26.12 provides a good approach.

Example 28.9. To illustrate a suboptimal application of mirror descent, suppose we had chosenF(a) =^12 ‖a‖^22 in the setting of Proposition 28.7. Then DF(at+1,at) =^12 ‖at+1−at‖^22 suggests choosing‖·‖tto be the standard Euclidean norm. SincediamF(A) = 1/2 and‖·‖ 2 ∗=‖·‖ 2 , applying Corollary 28.8 shows that

Rn≤

1

2 η

+

η 2

∑n

t=1

‖yt‖^22.

But now we see that‖yt‖^22 can be as large asdand tuningηwould lead to a rate ofO(

√

nd) rather thanO(

√

nlog(d)).

Bandit Algorithms

√

√

+

+

1

2

=

√

1

+

√

√

Get our desktop app

Company

Features

Documentation

Resources