Bandit Algorithms

14.3 Notes 184

0 1

− 0. 5

0

0. 5

1

x

x/ 2 1 −

√ 1 −x 1 −

√

log(1/x)/ 2

Figure 14.1Tightening the inequality of Le Cam

is large, Eq. (14.12) is worse than Eq. (14.10), or the inequality in Theorem 14.2.
This is the reason we call the inequality in Theorem 14.2 the high probability
Pinsker inequality.
5 We saw the total variation distance in Eq. (14.11). There are two other
‘distances’ that are occasionally useful. These are theHellinger distance
and theχ-squared distance, which using the notation in the proof of
Theorem 14.2 are defined by defined by

h(P,Q) =

√∫

(

√

p−

√

q)^2 =

√

2

(

1 −

∫

√

pq

)

(14.13)

χ^2 (P,Q) =

∫

(p−q)^2 q

=

∫

p^2 q

− 1. (14.14)

The Hellinger distance is bounded and exists for all probability measuresP andQ. A necessary condition for theχ^2 -distance to exist is thatPQ. Like the total variation distance, the Hellinger distance is actually a distance (it is symmetric and satisfies triangle inequality), but theχ^2 -‘distance’ is not. It is possible to show (Tsybakov [2008], Chapter 2) that

δ(P,Q)^2 ≤h(P,Q)^2 ≤D(P,Q)≤χ^2 (P,Q). (14.15)

All the inequalities are tight for some choices ofPandQ, but the examples
do not chain together as evidenced by Pinsker’s inequality, which shows that
δ(P,Q)^2 ≤D(P,Q)/2 (which is also tight for somePandQ).
6 The entropy for distributionPwas defined asH(P) in Eq. (14.3). IfXis a
random variable, thenH(X) is defined to be the entropy of the law ofX. This
is a convenient notation because it allows one to writeH(f(X)) andH(XY)
and similar expressions.

Bandit Algorithms

√

√∫

(

√

√

√

2

(

1 −

∫

√

)

(14.13)

∫

=

∫

− 1. (14.14)

Get our desktop app

Company

Features

Documentation

Resources