21.1 The Kiefer–Wolfowitz theorem 253(b)⇒(a): Suppose thatπ∗is a maximizer off. By the first-order optimality
criterion (see Section 26.5), for anyπdistribution onA,
0 ≥〈∇f(π∗),π−π∗〉
=∑
a∈Aπ(a)‖a‖^2 V(π∗)− 1 −∑
a∈Aπ∗(a)‖a‖^2 V(π∗)− 1=
∑
a∈Aπ(a)‖a‖^2 V(π∗)− 1 −d.For an arbitrarya ∈ A, choosing πto be the Dirac ata∈ Aproves that
‖a‖^2 V(π∗)− 1 ≤d. Henceg(π∗)≤d. Sinceg(π)≥dfor allπby Eq. (21.5), it follows
thatπ∗is a minimizer ofgand thatminπg(π) =d. (c) =⇒(b): Suppose that
g(π∗) =d. Then for anyπ,〈∇f(π∗),π−π∗〉=∑
a∈Aπ(a)‖a‖V(π∗)− 1 −d≤ 0.And it follows thatπ∗is a maximizer offby the first order optimality conditions
and the concavity off. That (a) =⇒(c) is now trivial. To prove the second
part of the theorem letπ∗be a minimizer ofg, which by the previous part is a
maximizer off. LetS=Supp(π∗) and suppose that|S|> d(d+ 1)/2. Since the
dimension of the subspace ofd×dsymmetric matrices isd(d+ 1)/2, there must
be a nonzero functionv:A→Rwith Supp(v)⊆Ssuch that
∑
a∈Sv(a)aa>= 0. (21.6)Notice that for anya ∈Sthe first order optimality conditions ensure that
‖a‖^2 V(π∗)− 1 =d(Exercise 21.5). Henced∑
a∈Sv(a) =∑
a∈Sv(a)‖a‖^2 V(π∗)− 1 = 0,where the last equality follows from Eq. (21.6). Letπ(t) =π∗+tvand assume
‖v‖ 2 = 1 and letT= max{t:π(t)∈PA}, which exists sincev 6 = 0. Then
f(π(T)) =f(π∗) +∫T
0d
dtf(π(t))dt=f(π∗) +∫T
0〈∇f(π(t)),v〉dt=f(π∗) +∫T
0∑
a∈Av(a)‖a‖^2 V(π(t))− 1 dt=f(π∗),where we used the fact thatV(π(t)) is constant and‖a‖^2 V(π(t))− 1 =dfor all
a∈Supp(v). Henceπ(T) also maximizes f. The claim follows by checking
that thanks to
∑
a∈Sv(a) = 0,|Supp(π(T))|<|Supp(π∗)|and then using
induction.