Bandit Algorithms

15.3 Notes 192

μto show that

D(Pμ,Pμ+∆)≈

∂

∂∆

D(Pμ,Pμ+∆)

∣∣

∆=0

∆ +

1

2

∂^2

∂∆^2

D(Pμ,Pμ+∆)

∣∣

∆=0

∆^2

= ∂

∂∆

∫

Ω

log

(

dPμ dPμ+∆

)

dPμ

∣∣

∆=0

∆ +^1

2

I(μ)∆^2

=−

∫

Ω

∂

∂∆

log

(

dPμ+∆ dPμ

)∣∣

∣∣

∆=0

dPμ∆ +

1

2

I(μ)∆^2

=−

∫

Ω

∂

∂∆

dPμ+∆ dPμ

∣∣

∆=0

dPμ∆ +

1

2

I(μ)∆^2

=−∂

∂∆

∫

Ω

dPμ+∆ dPμ

dPμ

∣∣

∆=0

∆ +^1

2

I(μ)∆^2

=−

∂

∂∆

∫

Ω

dPμ+∆

∣∣

∆=0

∆ +

1

2

I(μ)∆^2

=

1

2

I(μ)∆^2 ,

whereI(μ), introduced in the second line, is called theFisher information of the family (Pμ)μatμ. Note that ifλis a common dominating measure for (Pμ+∆) for ∆ small,dPμ+∆=pμ+∆dλand we can write

I(μ) =−

∫

∂^2

∂∆^2

logpμ+∆

∣∣

∆=0

pμdλ,

which is the form that is usually given in elementary texts. The upshot of all this is thatD(Pμ,Pμ+∆) for ∆ small is indeed quadratic in ∆, with the scaling provided byI(μ), and as a result the worst-case regret is alwaysO(

√

nk),
provided the class of distributions considered is sufficiently rich and not too
bizarre.
2 We have now shown a lower bound that is Ω(

√

nk), while many of the upper
bounds wereO(log(n)). There is no contradiction because the logarithmic
bounds depended on the inverse suboptimality gaps, which may be very large.
3 Our lower bound was only proven forn≥k−1. In Exercise 15.3 we ask you
to show that whenn < k−1 there exists a bandit such that

Rn≥

n(2k−n−1) 2 k

>

n 2

.

4 The method used to prove Theorem 15.2 can be viewed as a generalization
and strengthening ofLe Cam’s methodin statistics. Recall that Eq. (15.2)
establishes that for anyμandμ′,

infπ sup ν

Rn(π,ν)≥n∆ 8

exp(−D(Pμ,Pμ′)).

To explain Le Cam’s method we need a little notation. LetXbe an outcome space,Pa set of measures onXandθ:P →Θ where (Θ,d) is a metric space.

Bandit Algorithms

∂

∂∆

∣∣

∣∣

∆ +

1

2

∂^2

∂∆^2

∣∣

∣∣

∆^2

= ∂

∂∆

∫

(

)

∣∣

∣∣

∆ +^1

2

=−

∫

∂

∂∆

(

)∣∣

∣∣

1

2

=−

∫

∂

∂∆

∣∣

∣∣

1

2

=−∂

∂∆

∫

∣∣

∣∣

∆ +^1

2

=−

∂

∂∆

∫

∣∣

∣∣

∆ +

1

2

=

1

2

∫

∂^2

∂∆^2

∣∣

∣∣

√

√

>

.

Get our desktop app

Company

Features

Documentation

Resources