16.6 Classical information and Shannon entropy
Entropy may be given a meaning beyond traditional statistical mechanics. In developing a theory
of classical (non-quantum mechanical) information, Claude Shannon was led to a generalized notion
of entropy that characterizes the amount ofmissing informationfor a given ensemble. In the case
of information theory, the ensembles consist of messages, sent in words and sentences. To make
contact with the previous sections, a message may be viewed as a mixture of a certain number of
letters and words.
Consider an ensemble withnpossible outcomes, for example, by the number of different letters
or words in a text. A probabilitypiis assigned to each outcomei= 1,···,n, and total probability
is normalized to 1. This assignment of probabilities characterizes amacro-state. Thestatistical or
Shannon entropyassociated with this macro-state is defined by
S(p 1 ,···,pn) =−k
∑n
i=1
pilnpi
∑n
i=1
pi= 1 (16.35)
Here, kis a positive constant. For statistical mechanics,kis the Boltzmann constant, but in
information theory, one usually setsk= 1.
Shannon derived this formula from a number of simple basic assumptions. Here, we shall
contents ourselves by arguing that the formula follows from a large numbers approach to the
possible outcomes in a message. Suppose we have a message of lengthNformed out ofndifferent
lettersai,i= 1,···,n. We consider this problem in the limit whereN≫n. A message then takes
the form
ℓ 1 ℓ 2 ℓ 3 ···ℓN (16.36)
Since the probabilitypifor the letterito occur in this message is given, we expect the letter
ito occurpiNtimes. The number Ω of distinct messages with this assignment of letters gives a
quantitative measure of the amount ofmissing information. Namely, if you are given the information
that the message contains the letteripreciselypiNtimes, the more possible messages there are, the
more information is lacking. This number Ω of possible messages is also referred to as the number
ofmicro-statescorresponding to a given probability assgnment.
With this mixture of letters, and the assumption that the occurrence of the letters in the
message is uncorrelated, Ω is given by the multinomial coefficient. To make contact with the
statistical entropy formula, we take its logarithm,
ln Ω = ln
(
N!
(p 1 N)!(p 2 N)!···(pnN)!
)
(16.37)
SinceN≫1, we can use the Sterling formula to approximate the factorial, lnk! =k(lnk−1), and
we get
ln Ω = N(lnN−1)−
∑n
i=1
(piN)
(
ln(piN)− 1
)
= −N
∑n
i=1
pilnpi (16.38)