Bandit Algorithms

(Jeff_L) #1
28.2 Regret analysis 315

for allt. Then mirror descent with the unnormalized negentropy potential and
η=


2 log(d)/nsatisfiesRn≤


2 nlog(d).

Proof The Bregman divergence with respect to the unnormalized negentropy
potential fora,b∈AisD(a,b) =

∑d
i=1ailog(ai/bi). Therefore

Rn(a)≤F(a)−F(a^1 )
η

+


∑n

t=1

〈at−at+1,yt〉−^1
η

∑n

t=1

D(at+1,at)

≤log(d)
η

+


∑n

t=1

‖at−at+1‖ 1 ‖yt‖∞−^1
η

∑n

t=1

1


2


‖at−at+1‖^21

≤log(d)
η


2

∑n

t=1

‖yt‖^2 ∞≤log(d)
η

+ηn
2

=



2 nlog(d),

where the first inequality follows from Theorem 28.4, the second from Pinsker’s
inequality and the facts thatdiamF(A) =log(d). In the third inequality we used
that fact thatax−bx^2 / 2 ≤a^2 /(2b) for allx. The last inequality follows from
the assumption that‖yt‖∞≤1.


The last few steps in the above proof are so routine that we summarize their
use in a corollary, the proof of which we leave to the reader (Exercise 28.4).

Corollary28.8.LetFbe a Legendre potential and‖·‖tbe a norm onRdfor
eacht∈[n]such thatDF(at+1,at)≥^12 ‖at+1−at‖^2 t. Then the regret of mirror
descent or fol low the regularized leader satisfies


Rn≤diamF(A)
η


2

∑n

t=1

‖yt‖^2 t∗,

where‖y‖t∗= maxx:‖x‖t≤ 1 〈x,y〉is the dual norm of‖·‖t.


It often happens that the easiest way to bound the regret of mirror descent
is to find a norm that satisfies the conditions of Corollary 28.8. Very often
Theorem 26.12 provides a good approach.


Example 28.9. To illustrate a suboptimal application of mirror descent,
suppose we had chosenF(a) =^12 ‖a‖^22 in the setting of Proposition 28.7. Then
DF(at+1,at) =^12 ‖at+1−at‖^22 suggests choosing‖·‖tto be the standard Euclidean
norm. SincediamF(A) = 1/2 and‖·‖ 2 ∗=‖·‖ 2 , applying Corollary 28.8 shows
that

Rn≤

1


2 η

+


η
2

∑n

t=1

‖yt‖^22.

But now we see that‖yt‖^22 can be as large asdand tuningηwould lead to a rate
ofO(


nd) rather thanO(


nlog(d)).
Free download pdf