Bandit Algorithms

(Jeff_L) #1
12.2 Regret analysis 162

Proof LetAti=I{At=i}as before. WritingYt=


jAtjytj, we calculate

Yt−

∑k

j=1

PtjYˆtj=

∑k

j=1

(


1 −


Ptj
Ptj+γ

)


Atjytj=γ

∑k

j=1

Atj
Ptj+γ

ytj=γ

∑k

j=1

Yˆtj.

ThereforeL ̃n−Lˆn=γ

∑k
j=1Lˆnjas required.
Proof of Theorem 12.1 By Eq. (12.3) and Lemma 12.4 we have

Rˆn≤log(k)
η

+ (L ̃n−Lˆn) + max
i∈[k]

(Lˆni−Lni) +

η
2

∑k

j=1

Lˆnj

=


log(k)
η

+ max
i∈[k]

(Lˆni−Lni) +


2


)∑k

j=1

Lˆnj.

Therefore by Lemma 12.3, with probability at least 1−δit holds that

Rˆn≤log(k)
η

+


log

(k+1
δ

)


2 γ

+


(


γ+

η
2

)




∑k

j=1

Lnj+

log

(k+1
δ

)


2 γ




log(k)
η +

(


γ+

η
2

)


nk+

(


γ+

η
2 + 1

)log(k+1
δ

)


2 γ ,
where the second inequality follows sinceLnj≤nfor allj. The result follows by
substituting the definitions ofη∈{η 1 ,η 2 }andγ=η/2.

12.2.1 Proof of Lemma 12.2


We start with a technical inequality.

Lemma12.5.For any 0 ≤x≤ 2 λit holds thatexp

(


x
1 +λ

)


≤1 +x.

Note that 1 +x≤exp(x). What the lemma shows is that by slightly discounting
the argument of the exponential function, in a bounded neighborhood of zero,
1 +xcan be an upper bound for the resulting function. Or, equivalently, slightly
inflating the linear term in 1 +x, the linear lower bound becomes an upper bound.
Proof of Lemma 12.5 We have

exp

(


x
1 +λ

)


≤exp

(


x
1 +x/ 2

)


≤1 +x,

where the first inequality is becauseλ7→exp

(


x
1+λ

)


is decreasing inλ, and the
second is because1+^2 uu≤log(1 + 2u) holds for allu≥0. This latter inequality
can be seen to hold by noting that foru= 0 the two sides are equal, while the
derivative of the left-hand side is smaller than that of the right-hand side at any
u≥0.
Free download pdf