Bandit Algorithms

14.2 The relative entropy 181

The proof may be found at the end of the chapter, but first some interpretation and a simple application. Suppose thatD(P,Q) is small, thenPis close toQin some sense. SincePis a probability measure we haveP(A) +P(Ac) = 1. IfQis close toP, then we might expectP(A) +Q(Ac) should be large. The purpose of the theorem is to quantify just how large. Note that ifP is not absolutely continuous with respect toQthenD(P,Q) =∞and the result is vacuous. Also note that the result is symmetric. We could replaceD(P,Q) withD(Q,P), which sometimes leads to a stronger result because the relative entropy is not symmetric. Returning to the hypothesis testing problem described in the previous chapter. LetXbe normally distributed with unknown meanμ∈ { 0 ,∆}and variance σ^2 >0. We want to bound the quality of a rule for deciding what is the real mean from a single observation. The decision rule is characterized by a measurable setA⊆Ron which the predictor guessesμ= ∆ (it predictsμ= 0 on the complement ofA). LetP=N(0,σ^2 ) andQ=N(∆,σ^2 ). Then the probability of an error underPisP(A) and the probability of error underQisQ(Ac). The reader surely knows what to do next. By Theorem 14.2 we have

P(A) +Q(Ac)≥^1 2

exp (−D(P,Q)) =^1 2

exp

(

−∆

2 2 σ^2

)

.

If we assume that the signal to noise ratio is small, ∆^2 /σ^2 ≤1, then

P(A) +Q(Ac)≥

1

2

exp

(

−

1

2

)

≥

3

10

,

which impliesmax{P(A),Q(Ac)} ≥ 3 /20. This means that no matter how we
chose our decision rule, we simply do not have enough data to make a decision
for which the probability of error on eitherPorQis smaller than 3/20.

Proof of Theorem 14.2 For realsa,bwe abbreviatemax{a,b}= a∨band
min{a,b}=a∧b. The result is trivial ifD(P,Q) =∞. On the other hand, by
Theorem 14.1,D(P,Q)<∞implies thatPQ. Letν=P+Q. ThenP,Qν,
which by Theorem 2.13 ensures the existence of the Radon-Nikodym derivatives
p=dPdν andq=dQdν. The chain rule gives

dP dQ

dQ dν

=

dP dν

and

dP dQ

=

dP dν dQ dν

.

Therefore

D(P,Q) =

∫

plog

(

p q

)

dν.

For brevity, when writing integrals with respect toν, in this proof, we will drop dν. Thus, we will write, for example

∫

plog(p/q) for the above integral. Instead of (14.6), we prove the stronger result that ∫ p∧q≥

1

2

exp(−D(P,Q)). (14.7)

Bandit Algorithms

(

−∆

)

.

1

2

(

−

1

2

)

≥

3

10

,

=

=

.

∫

(

)

∫

1

2

Get our desktop app

Company

Features

Documentation

Resources