76 2. PROBABILITY DISTRIBUTIONS
We can solve for the Lagrange multiplier∑ λby substituting (2.32) into the constraint
kμk=1to giveλ=−N. Thus we obtain the maximum likelihood solution in
the form
μMLk =
mk
N
(2.33)
which is the fraction of theNobservations for whichxk=1.
We can consider the joint distribution of the quantitiesm 1 ,...,mK, conditioned
on the parametersμand on the total numberNof observations. From (2.29) this
takes the form
Mult(m 1 ,m 2 ,...,mK|μ,N)=
(
N
m 1 m 2 ...mK
)∏K
k=1
μmkk (2.34)
which is known as themultinomialdistribution. The normalization coefficient is the
number of ways of partitioningNobjects intoKgroups of sizem 1 ,...,mKand is
given by (
N
m 1 m 2 ...mK
)
=
N!
m 1 !m 2 !...mK!
. (2.35)
Note that the variablesmkare subject to the constraint
∑K
k=1
mk=N. (2.36)
2.2.1 The Dirichlet distribution
We now introduce a family of prior distributions for the parameters{μk}of
the multinomial distribution (2.34). By inspection of the form of the multinomial
distribution, we see that the conjugate prior is given by
p(μ|α)∝
∏K
k=1
μαkk−^1 (2.37)
where 0 μk 1 and
∑
kμk=1. Hereα^1 ,...,αKare the parameters of the
distribution, andαdenotes(α 1 ,...,αK)T. Note that, because of the summation
constraint, the distribution over the space of the{μk}is confined to asimplexof
dimensionalityK− 1 , as illustrated forK=3in Figure 2.4.
Exercise 2.9 The normalized form for this distribution is by
Dir(μ|α)=
Γ(α 0 )
Γ(α 1 )···Γ(αK)
∏K
k=1
μαkk−^1 (2.38)
which is called theDirichletdistribution. HereΓ(x)is the gamma function defined
by (1.141) while
α 0 =
∑K
k=1
αk. (2.39)