Bandit Algorithms

21.1 The Kiefer–Wolfowitz theorem 252

In the subfield of statistics called optimal experimental design, the distribution πis called adesignand the problem of finding a design that minimizesgis called theG-optimal design problem. So how to use this? Suppose thatπis a design anda∈Supp(π) and

na=

⌈

π(a)g(π) ε^2

log

(

1

δ

)⌉

. (21.3)

Then choosing each actiona∈Supp(π) exactlynatimes ensures that

V=

∑

a∈Supp(π)

naaa>≥g(π) ε^2

log

(

1

δ

)

V(π),

which by Eq. (21.1) means that for anya∈A, with probability 1−δ,

〈θˆ−θ∗,a〉≤

√

2 ‖a‖^2 V− 1 log

(

1

δ

)

≤ε.

By Eq. (21.3), the total number of actions required to ensure a confidence width of no more thanεis bounded by

n=

∑

a∈Supp(π)

na=

∑

a∈Supp(π)

⌈

π(a)g(π) ε^2 log

(

1

δ

)⌉

≤|Supp(π)|+

g(π) ε^2 log

(

1

δ

)

.

The set Supp(π) is sometimes called thecore set. The following theorem
characterizes the size of the core set and the minimum ofg.

Theorem 21.1 (Kiefer–Wolfowitz). Assume that A ⊂ Rd is compact and
span(A) =Rd. The fol lowing are equivalent:

(a)π∗is a minimizer ofg.
(b)π∗is a maximizer off(π) = log detV(π).
(c)g(π∗) =d.

Furthermore, there exists a minimizerπ∗ofgsuch that|Supp(π∗)|≤d(d+ 1)/ 2.

A design that maximizesfis known as aD-optimal designand thus the theorem establishes the equivalence of G-optimal and D-optimal designs.

Proof We give the proof for finiteA. The general case follows by passing to the limit (Exercise 21.3). When it is convenient, distributionsπonAare treated as vectors inR|A|. You will show in Exercises 21.1 and 21.2 thatfis concave and that

(∇f(π))a=‖a‖^2 V(π)− 1. (21.4)

Also notice that

∑

a∈A

π(a)‖a‖^2 V(π)− 1 = trace

(

∑

a

π(a)aa>V(π)−^1

)

= trace(I) =d. (21.5)

Bandit Algorithms

⌈

(

1

)⌉

. (21.3)

∑

(

1

)

√

(

1

)

∑

∑

⌈

(

1

)⌉

(

1

)

.

(

∑

)

Get our desktop app

Company

Features

Documentation

Resources