Bandit Algorithms

8.2 Notes 115

we use Lemma 8.2 to get

E

[n ∑

t=1

I

{

μˆi(t−1) +

√

2 logf(t) Ti(t−1)

≥μ 1 −εandAt=i

}]

≤E

[n ∑

t=1

I

{

ˆμi(t−1) +

√

2 logf(n) Ti(t−1)

≥μ 1 −εandAt=i

}]

≤E

[n ∑

s=1

I

{

ˆμis+

√

2 logf(n) s

≥μ 1 −ε

}]

=E

[n ∑

s=1

I

{

ˆμis−μi+

√

2 logf(n) s

≥∆i−ε

}]

≤1 +

2

(∆i−ε)^2

(

logf(n) +

√

πlogf(n) + 1

)

The first part of the theorem follows by substituting the results of the previous
two displays into(8.4). The second part follows by choosingε=log−^1 /^4 (n) and
taking the limit asntends to infinity.

8.2 Notes

1 The improvement to the constants comes from making the confidence interval slightly smaller, which is made possible by a more careful analysis. The main trick is the observation that we do not need to show thatˆμ 1 s≥μ 1 for alls with high probability, but instead that ˆμ 1 s≥μ 1 −εfor smallε. 2 The choice off(t) = 1 +tlog^2 (t) looks quite odd. As we pointed out in the proof, things would not have gone through had we chosenf(t) =t. With a slightly messier calculation we could have chosenf(t) =tlogα(t) for anyα >0. If the rewards are actually Gaussian, then a more careful concentration analysis allows one to choosef(t) =tor even some slightly slower growing function [Katehakis and Robbins, 1995, Lattimore, 2016a, Garivier et al., 2016b]. 3 The asymptotic regret is often indicative of finite-time performance. The reader is advised to be cautious, however. The lower-order terms obscured by the asymptotics can be dominant in all practical regimes and problems can exhibit phase transitions where the asymptotics exhibit a very different behavior than intermediate regimes.

8.3 Bibliographic remarks

Lai and Robbins [1985] designed policies for which Eq. (8.2) holds. They also proved a lower bound showing that no ‘reasonable’ policy can improve on this bound for any problem, where ‘reasonable’ means that they suffer subpolynomial

Bandit Algorithms

E

I

{

√

}]

≤E

I

{

√

}]

≤E

I

{

√

}]

=E

I

{

√

}]

≤1 +

2

(

√

)

8.2 Notes

8.3 Bibliographic remarks

Get our desktop app

Company

Features

Documentation

Resources