Bandit Algorithms

(Jeff_L) #1
14.2 The relative entropy 181

The proof may be found at the end of the chapter, but first some interpretation
and a simple application. Suppose thatD(P,Q) is small, thenPis close toQin
some sense. SincePis a probability measure we haveP(A) +P(Ac) = 1. IfQis
close toP, then we might expectP(A) +Q(Ac) should be large. The purpose
of the theorem is to quantify just how large. Note that ifP is not absolutely
continuous with respect toQthenD(P,Q) =∞and the result is vacuous. Also
note that the result is symmetric. We could replaceD(P,Q) withD(Q,P), which
sometimes leads to a stronger result because the relative entropy is not symmetric.
Returning to the hypothesis testing problem described in the previous chapter.
LetXbe normally distributed with unknown meanμ∈ { 0 ,∆}and variance
σ^2 >0. We want to bound the quality of a rule for deciding what is the real mean
from a single observation. The decision rule is characterized by a measurable
setA⊆Ron which the predictor guessesμ= ∆ (it predictsμ= 0 on the
complement ofA). LetP=N(0,σ^2 ) andQ=N(∆,σ^2 ). Then the probability
of an error underPisP(A) and the probability of error underQisQ(Ac). The
reader surely knows what to do next. By Theorem 14.2 we have

P(A) +Q(Ac)≥^1
2

exp (−D(P,Q)) =^1
2

exp

(


−∆


2
2 σ^2

)


.


If we assume that the signal to noise ratio is small, ∆^2 /σ^2 ≤1, then

P(A) +Q(Ac)≥

1


2


exp

(



1


2


)



3


10


,


which impliesmax{P(A),Q(Ac)} ≥ 3 /20. This means that no matter how we
chose our decision rule, we simply do not have enough data to make a decision
for which the probability of error on eitherPorQis smaller than 3/20.


Proof of Theorem 14.2 For realsa,bwe abbreviatemax{a,b}= a∨band
min{a,b}=a∧b. The result is trivial ifD(P,Q) =∞. On the other hand, by
Theorem 14.1,D(P,Q)<∞implies thatPQ. Letν=P+Q. ThenP,Qν,
which by Theorem 2.13 ensures the existence of the Radon-Nikodym derivatives
p=dPdν andq=dQdν. The chain rule gives


dP
dQ

dQ

=


dP

and

dP
dQ

=


dP

dQ

.


Therefore


D(P,Q) =


plog

(


p
q

)


dν.

For brevity, when writing integrals with respect toν, in this proof, we will drop
dν. Thus, we will write, for example


plog(p/q) for the above integral. Instead
of (14.6), we prove the stronger result that

p∧q≥

1


2


exp(−D(P,Q)). (14.7)
Free download pdf