Bandit Algorithms

(Jeff_L) #1
28.1 Online linear optimization 311

occurs in the interior ofD ⊆Aand that∇Φt(at+1) = 0 (see Exercise 28.1). This
means thatηyt=∇F(at)−∇F(at+1) and so

∇F(at+1) =−ηyt+∇F(at) =∇F(a 1 )−η

∑t

s=1

ys=−η

∑t

s=1

ys,

where the last equality is true becausea 1 is chosen as the minimizer ofF in
A∩D=Dand again the fact thatFis Legendre ensures this minimum occurs
at an interior point where the gradient vanishes. Follow the regularized leader
choosesat+1to minimize Φ′t(a) =η


∑t
s=1〈a,ys〉+F(a). The same argument
shows that∇Φ′t(at+1) = 0, which means that

∇F(at+1) =−η

∑t

s=1

ys.

The last two displays and the fact that the gradient for Legendre functions is
invertible shows that mirror descent and follow the regularized leader are the
same in this setting.


The equivalence between these algorithms is far from universal. First of
all, it does not generally hold whenF is not Legendre or its domain is
larger thanA. Second, in many applications of these algorithms the learning
rate or potential change with time and in either case the algorithms will
typically produce different action sequences. For example, if a learning
rateηtis used rather thanηin the definition of Φt, then mirror descent
chooses∇F(at+1) =−

∑t
s=1ηsyswhile follow the regularized leader chooses
∇F(at+1) =−ηt

∑t
s=1ys. We return to this issue in the notes and exercises.

Example 28.1. LetA = Rd andF(a) =^12 ‖a‖^22. Then ∇F(a) = a and
D(a,at) =^12 ‖a−at‖^22. ClearlyF is Legendre andD=Aso mirror descent
and follow the regularized leader are the same. By simple calculus we see that

at+1= argmina∈Rdη〈a,yt〉+

1


2 ‖a−at‖

2
2 =at−ηyt,

which may be familiar asonline gradient descent.


Example28.2. LetAbe a compact convex subset ofRdandF(a) =^12 ‖a‖^22.
Then mirror descent chooses


at+1= argmina∈Aη〈a,yt〉+

1


2


‖a−at‖^22 = Π(at−ηyt), (28.4)

where Π(a) is the Euclidean projection ofaontoA. This algorithm is usually called
online projected gradient descent. On the other hand, for follow the regularized
leader we have


at+1= argmina∈Aη

∑t

s=1

〈a,ys〉+

1


2


‖a−at‖^22 = Π

(


−η

∑t

s=1

ys

)


,

Free download pdf