21.1 The Kiefer–Wolfowitz theorem 252
In the subfield of statistics called optimal experimental design, the distribution
πis called adesignand the problem of finding a design that minimizesgis
called theG-optimal design problem. So how to use this? Suppose thatπis
a design anda∈Supp(π) and
na=
⌈
π(a)g(π)
ε^2
log
(
1
δ
)⌉
. (21.3)
Then choosing each actiona∈Supp(π) exactlynatimes ensures that
V=
∑
a∈Supp(π)
naaa>≥g(π)
ε^2
log
(
1
δ
)
V(π),
which by Eq. (21.1) means that for anya∈A, with probability 1−δ,
〈θˆ−θ∗,a〉≤
√
2 ‖a‖^2 V− 1 log
(
1
δ
)
≤ε.
By Eq. (21.3), the total number of actions required to ensure a confidence width
of no more thanεis bounded by
n=
∑
a∈Supp(π)
na=
∑
a∈Supp(π)
⌈
π(a)g(π)
ε^2 log
(
1
δ
)⌉
≤|Supp(π)|+
g(π)
ε^2 log
(
1
δ
)
.
The set Supp(π) is sometimes called thecore set. The following theorem
characterizes the size of the core set and the minimum ofg.
Theorem 21.1 (Kiefer–Wolfowitz). Assume that A ⊂ Rd is compact and
span(A) =Rd. The fol lowing are equivalent:
(a)π∗is a minimizer ofg.
(b)π∗is a maximizer off(π) = log detV(π).
(c)g(π∗) =d.
Furthermore, there exists a minimizerπ∗ofgsuch that|Supp(π∗)|≤d(d+ 1)/ 2.
A design that maximizesfis known as aD-optimal designand thus the
theorem establishes the equivalence of G-optimal and D-optimal designs.
Proof We give the proof for finiteA. The general case follows by passing to the
limit (Exercise 21.3). When it is convenient, distributionsπonAare treated as
vectors inR|A|. You will show in Exercises 21.1 and 21.2 thatfis concave and
that
(∇f(π))a=‖a‖^2 V(π)− 1. (21.4)
Also notice that
∑
a∈A
π(a)‖a‖^2 V(π)− 1 = trace
(
∑
a
π(a)aa>V(π)−^1
)
= trace(I) =d. (21.5)