76 2. PROBABILITY DISTRIBUTIONS
We can solve for the Lagrange multiplier∑ λby substituting (2.32) into the constraint
kμk=1to giveλ=−N. Thus we obtain the maximum likelihood solution in
the form
μMLk =mk
N(2.33)
which is the fraction of theNobservations for whichxk=1.
We can consider the joint distribution of the quantitiesm 1 ,...,mK, conditioned
on the parametersμand on the total numberNof observations. From (2.29) this
takes the formMult(m 1 ,m 2 ,...,mK|μ,N)=(
N
m 1 m 2 ...mK)∏Kk=1μmkk (2.34)which is known as themultinomialdistribution. The normalization coefficient is the
number of ways of partitioningNobjects intoKgroups of sizem 1 ,...,mKand is
given by (
N
m 1 m 2 ...mK)
=N!
m 1 !m 2 !...mK!. (2.35)
Note that the variablesmkare subject to the constraint∑Kk=1mk=N. (2.36)2.2.1 The Dirichlet distribution
We now introduce a family of prior distributions for the parameters{μk}of
the multinomial distribution (2.34). By inspection of the form of the multinomial
distribution, we see that the conjugate prior is given byp(μ|α)∝∏Kk=1μαkk−^1 (2.37)where 0 μk 1 and∑
kμk=1. Hereα^1 ,...,αKare the parameters of the
distribution, andαdenotes(α 1 ,...,αK)T. Note that, because of the summation
constraint, the distribution over the space of the{μk}is confined to asimplexof
dimensionalityK− 1 , as illustrated forK=3in Figure 2.4.
Exercise 2.9 The normalized form for this distribution is by
Dir(μ|α)=Γ(α 0 )
Γ(α 1 )···Γ(αK)∏Kk=1μαkk−^1 (2.38)which is called theDirichletdistribution. HereΓ(x)is the gamma function defined
by (1.141) whileα 0 =∑Kk=1αk. (2.39)