Bandit Algorithms

(Jeff_L) #1
21.1 The Kiefer–Wolfowitz theorem 252

In the subfield of statistics called optimal experimental design, the distribution
πis called adesignand the problem of finding a design that minimizesgis
called theG-optimal design problem. So how to use this? Suppose thatπis
a design anda∈Supp(π) and

na=


π(a)g(π)
ε^2

log

(


1


δ

)⌉


. (21.3)


Then choosing each actiona∈Supp(π) exactlynatimes ensures that


V=


a∈Supp(π)

naaa>≥g(π)
ε^2

log

(


1


δ

)


V(π),

which by Eq. (21.1) means that for anya∈A, with probability 1−δ,


〈θˆ−θ∗,a〉≤


2 ‖a‖^2 V− 1 log

(


1


δ

)


≤ε.

By Eq. (21.3), the total number of actions required to ensure a confidence width
of no more thanεis bounded by

n=


a∈Supp(π)

na=


a∈Supp(π)


π(a)g(π)
ε^2 log

(


1


δ

)⌉


≤|Supp(π)|+

g(π)
ε^2 log

(


1


δ

)


.


The set Supp(π) is sometimes called thecore set. The following theorem
characterizes the size of the core set and the minimum ofg.


Theorem 21.1 (Kiefer–Wolfowitz). Assume that A ⊂ Rd is compact and
span(A) =Rd. The fol lowing are equivalent:


(a)π∗is a minimizer ofg.
(b)π∗is a maximizer off(π) = log detV(π).
(c)g(π∗) =d.


Furthermore, there exists a minimizerπ∗ofgsuch that|Supp(π∗)|≤d(d+ 1)/ 2.


A design that maximizesfis known as aD-optimal designand thus the
theorem establishes the equivalence of G-optimal and D-optimal designs.

Proof We give the proof for finiteA. The general case follows by passing to the
limit (Exercise 21.3). When it is convenient, distributionsπonAare treated as
vectors inR|A|. You will show in Exercises 21.1 and 21.2 thatfis concave and
that

(∇f(π))a=‖a‖^2 V(π)− 1. (21.4)

Also notice that



a∈A

π(a)‖a‖^2 V(π)− 1 = trace

(



a

π(a)aa>V(π)−^1

)


= trace(I) =d. (21.5)
Free download pdf