Bandit Algorithms

(Jeff_L) #1
This material will be published by Cambridge University Press as Bandit
Algorithms by Tor Lattimore and Csaba Szepesvari. This pre-publication version
is free to view and download for personal use only. Not for re-distribution, sale,
or use in derivative works.©Tor Lattimore and Csaba Szepesvari 2017.
The latest version is available athttp://banditalgs.com. Feedback on any
aspect is very welcome: [email protected]

14 Foundations of Information Theory ( )


To make the arguments in the previous chapter rigorous and generalizable to
other settings we need some tools from information theory and statistics. The
most important of these is therelative entropy, also known as theKullback-
Leibler divergencenamed for Solomon Kullback and Richard Leibler (KL
divergence, for short).

14.1 Entropy and optimal coding


Alice wants to communicate with Bob. She wants to tell Bob the outcome of a
sequencenof independent random variables sampled from known distributionQ.
Alice and Bob agree to communicate using a binary code that is fixed in advance
in such a way that the expected message length is minimized. TheentropyofQ
is the expected number of bits necessary per random variable using the optimal
code asntends to infinity. The relative entropy between distributionsPandQ
is the price in terms of expected message length that Alice and Bob have to pay
if they believe the random variables are sampled fromQwhen in fact they are
sampled fromP.
LetPbe a measure on [N] withσ-algebra 2[N]andX: [N]→[N] be the
identity random variable,X(ω) =ω. Alice observes a realization ofXand wants
to communicate the result to Bob using abinary codethat they agree upon
in advance. For example, whenN= 4 they might agree on the following code:
1 → 00 , 2 → 01 , 3 → 10 , 4 →11. Then if Alice observes a 3, she sends Bob a
message containing 10. For our purposes, a code is a functionc: [N]→{ 0 , 1 }∗
where{ 0 , 1 }∗is the set of finite sequences of zeros and ones.
Of coursecmust be injective so that no two numbers (orsymbols) have the
same code. We also require thatcbeprefix free, which means that no code is a
prefix of any other. This is justified by supposing that Alice would like to tell
Bob about multiple samples. Then Bob needs to know where the message for one
symbol starts and ends.

Using a prefix code is not the only way to enforce unique decodability, but
all uniquely decodable codes have equivalent prefix codes (see Note 1).
Free download pdf