36.4 Information theoretic analysis 438
from context we writeHP. LetG ⊆Fbe a sub-σ-algebra. Then theconditional
entropyofXgivenGis
H(X|G) =E
∑
x∈range(X)
P(X=x|G) log
(
1
P(X=x|G)
)
.
Despite the nomenclature and the notation,H(X|G) is a real value, not a
random variable. Noting thatH(X|G) =E
[
HP(·|G)(X)
]
, a better nomenclature
would have been the expected conditional entropy, but we follow the standard
convention. GivenP, the entropy of random variableXis a measure of the amount
of information inXwhile the conditional entropy givenGis the expected amount
of information required to encodeXhaving observed the information inG. The
mutual informationbetweenXandGis the difference between the entropy
and the conditional entropy:
I(X;G) =H(X)−H(X|G).
The mutual information is always nonnegative, which should not be surprising
because the information remaining inXcan only decrease as more information
is observed. Another name for the mutual information isinformation gain.
As with the entropy, we use these forms when the underlying measure is clear
from context. If this is not the case, then the measure is shown in the subscript:
H(X) =HP(X),H(X|G) =HP(X|G), andI(X;G) =IP(X;G). The following
lemmas provide a chain rule for the mutual information and a simple connection
to the relative entropy. The proofs are definitional and are left to Exercise 36.7.
Lemma36.5.LetXbe a random variable on(Ω,F,P)and(Ft)nt=0a filtration
ofFwithF 0 ={∅,Ω}andPt(·) =P(·|Ft). Then,
E
[n
∑
t=1
IPt− 1 (X;Ft)
]
=IP(X;Fn).
For the next lemma, for random variablesX,Ydefined on a common probability
space we introduce the abbreviationI(X;Y) =I(X;σ(Y)).
Lemma36.6.LetXandY be random variables on probability space(Ω,F,P). If
Xis discrete, thenI(X;Y) =E[D(P(Y∈·|X),P(Y∈·))].
36.4.2 Bayesian adversarial bandits
In the adversarial bandit model studied in Part III the adversary secretly chooses
a matrixx∈[0,1]n×kat the start of the game. The reward in roundtisxtAt.
A Bayesian spin on this setup works as follows. The priorQis a probability
measure on ([0,1]n×k,B([0,1]n×k), which is known to the learner. At the start
of the game a reward matrixXis sampled fromQ, but not revealed. The learner
then chooses actions (At)nt=1and the reward in roundtisXtAt. Formally, let