6.1. How to measure disorder[[Student version, January 17, 2003]] 175
measure of disorder to be additive. This is why the logarithm function enters our formula. Letting
Ω=MNbethe total number of all possibleN-event messages andK=1/ln 2, we can rewrite the
proposed formula as
I=Kln Ω. (6.1)
Wealso want to measure disorder in other kinds of event streams. Suppose we have a message
Nletters long in an alphabet withMletters (let’s sayM=33, Russian), and we know in advance
that the letter frequencyisn’tuniform: There areN 1 letters “A,”N 2 letters “B,” and so on. That
is, thecompositionof our stream of symbols is specfied, though itssequenceis not. The frequency
of each letter is thenPi=Ni/N,and thePiaren’t necessarily all equal to 1/33.
The total number of all possible messages is then
Ω=
N!
N 1 !N 2 !···NM!. (6.2)
Tojustify this formula we extend the logic of the Example on page 101: There areN factorial
(writtenN!) ways to takeNobjects and arrange them into a sequence. But swapping any of the
A’s among themselves doesn’t change the message, soN!overcounts the possible messages: we
need to divide byN 1 !toeliminate this redundancy. Arguing similarly for the other letters in the
message gives Equation 6.2. (It’s always best to test theory with experiment, so try it with two
apples, a peach and a pear (M=3,N=4).)
If all we know about the message are the letter frequencies, then any of thenpossible messages
is equally likely. Let us apply the proposed disorder formula (Equation 6.1) to the entire message:
I=K
[
lnN!−
∑M
i=1
lnNi!
]
.
Wecan simplify this expression if the message is very long, using Stirling’s Formula (Equation 4.2
on page 102). For very largeN,weonly need to keep the terms in Stirling’s formula that are
proportional toN,namely lnN!≈NlnN−N.Thusthe amount of disorder per letter isI/N=
−K
∑
i
Ni
N ln
Ni
N,or
I/N=−K
∑M
j=1
PjlnPj. Shannon’s formula (6.3)
Naturally not every string of letters makes sense in real Russian, even if it has the correct letter
frequencies. If we have the extra knowledge that the string consists of real text, then we could take
Nto be the number ofwordsin the message,Mto be the number of words listed in the dictionary,
Pithe frequencies of usage of each word, and again use Equation 6.3 to get a revised (and smaller)
estimate of the amount of disorder of a message in this more restricted class. That is, real text is
more predictable, and so carries even less disorder per letter, than random strings with the letter
frequencies of real text.
Shannon’s formula has some sensible features. First notice thatI is always positive, since
the logarithm of a number smaller than 1 is always negative. If every letter is equally probable,
Pi=1/M,then Equation 6.3 just reproduces our original proposal,I=KNlnM.If, on the other
hand, we know that every letter is an “A,” thenP 1 =1,all the otherPi=0and we findI=0: A
string of all “A’s” is perfectly predictable and has zero disorder. Since Equation 6.3 makes sense
and came from Equation 6.1, we’ll accept the latter as a good measure of disorder.