14.2 The relative entropy 181
The proof may be found at the end of the chapter, but first some interpretation
and a simple application. Suppose thatD(P,Q) is small, thenPis close toQin
some sense. SincePis a probability measure we haveP(A) +P(Ac) = 1. IfQis
close toP, then we might expectP(A) +Q(Ac) should be large. The purpose
of the theorem is to quantify just how large. Note that ifP is not absolutely
continuous with respect toQthenD(P,Q) =∞and the result is vacuous. Also
note that the result is symmetric. We could replaceD(P,Q) withD(Q,P), which
sometimes leads to a stronger result because the relative entropy is not symmetric.
Returning to the hypothesis testing problem described in the previous chapter.
LetXbe normally distributed with unknown meanμ∈ { 0 ,∆}and variance
σ^2 >0. We want to bound the quality of a rule for deciding what is the real mean
from a single observation. The decision rule is characterized by a measurable
setA⊆Ron which the predictor guessesμ= ∆ (it predictsμ= 0 on the
complement ofA). LetP=N(0,σ^2 ) andQ=N(∆,σ^2 ). Then the probability
of an error underPisP(A) and the probability of error underQisQ(Ac). The
reader surely knows what to do next. By Theorem 14.2 we have
P(A) +Q(Ac)≥^1
2
exp (−D(P,Q)) =^1
2
exp
(
−∆
2
2 σ^2
)
.
If we assume that the signal to noise ratio is small, ∆^2 /σ^2 ≤1, then
P(A) +Q(Ac)≥
1
2
exp
(
−
1
2
)
≥
3
10
,
which impliesmax{P(A),Q(Ac)} ≥ 3 /20. This means that no matter how we
chose our decision rule, we simply do not have enough data to make a decision
for which the probability of error on eitherPorQis smaller than 3/20.
Proof of Theorem 14.2 For realsa,bwe abbreviatemax{a,b}= a∨band
min{a,b}=a∧b. The result is trivial ifD(P,Q) =∞. On the other hand, by
Theorem 14.1,D(P,Q)<∞implies thatPQ. Letν=P+Q. ThenP,Qν,
which by Theorem 2.13 ensures the existence of the Radon-Nikodym derivatives
p=dPdν andq=dQdν. The chain rule gives
dP
dQ
dQ
dν
=
dP
dν
and
dP
dQ
=
dP
dν
dQ
dν
.
Therefore
D(P,Q) =
∫
plog
(
p
q
)
dν.
For brevity, when writing integrals with respect toν, in this proof, we will drop
dν. Thus, we will write, for example
∫
plog(p/q) for the above integral. Instead
of (14.6), we prove the stronger result that
∫
p∧q≥
1
2
exp(−D(P,Q)). (14.7)