Bandit Algorithms

This material will be published by Cambridge University Press as Bandit Algorithms by Tor Lattimore and Csaba Szepesvari. This pre-publication version is free to view and download for personal use only. Not for re-distribution, sale, or use in derivative works.©Tor Lattimore and Csaba Szepesvari 2017. The latest version is available athttp://banditalgs.com. Feedback on any aspect is very welcome: [email protected]

14 Foundations of Information Theory ( )

To make the arguments in the previous chapter rigorous and generalizable to other settings we need some tools from information theory and statistics. The most important of these is therelative entropy, also known as theKullback- Leibler divergencenamed for Solomon Kullback and Richard Leibler (KL divergence, for short).

14.1 Entropy and optimal coding

Alice wants to communicate with Bob. She wants to tell Bob the outcome of a sequencenof independent random variables sampled from known distributionQ. Alice and Bob agree to communicate using a binary code that is fixed in advance in such a way that the expected message length is minimized. TheentropyofQ is the expected number of bits necessary per random variable using the optimal code asntends to infinity. The relative entropy between distributionsPandQ is the price in terms of expected message length that Alice and Bob have to pay if they believe the random variables are sampled fromQwhen in fact they are sampled fromP. LetPbe a measure on [N] withσ-algebra 2[N]andX: [N]→[N] be the identity random variable,X(ω) =ω. Alice observes a realization ofXand wants to communicate the result to Bob using abinary codethat they agree upon in advance. For example, whenN= 4 they might agree on the following code: 1 → 00 , 2 → 01 , 3 → 10 , 4 →11. Then if Alice observes a 3, she sends Bob a message containing 10. For our purposes, a code is a functionc: [N]→{ 0 , 1 }∗ where{ 0 , 1 }∗is the set of finite sequences of zeros and ones. Of coursecmust be injective so that no two numbers (orsymbols) have the same code. We also require thatcbeprefix free, which means that no code is a prefix of any other. This is justified by supposing that Alice would like to tell Bob about multiple samples. Then Bob needs to know where the message for one symbol starts and ends.

Using a prefix code is not the only way to enforce unique decodability, but all uniquely decodable codes have equivalent prefix codes (see Note 1).

Bandit Algorithms

14 Foundations of Information Theory ( )

14.1 Entropy and optimal coding

Get our desktop app

Company

Features

Documentation

Resources