Pattern Recognition and Machine Learning

(Jeff_L) #1
2.4. The Exponential Family 115

Making use of the constraint (2.209), the multinomial distribution in this representa-
tion then becomes


exp

{M

k=1

xklnμk

}

=exp

{M− 1

k=1

xklnμk+

(
1 −

M∑− 1

k=1

xk

)
ln

(
1 −

M∑− 1

k=1

μk

)}

=exp

{M− 1

k=1

xkln

(
μk
1 −

∑M− 1
j=1 μj

)

+ln

(

1 −

M∑− 1

k=1

μk

)}

. (2.211)


We now identify


ln

(
μk
1 −


jμj

)
=ηk (2.212)

which we can solve forμkby first summing both sides overkand then rearranging
and back-substituting to give


μk=

exp(ηk)
1+


jexp(ηj)

. (2.213)

This is called thesoftmaxfunction, or thenormalized exponential. In this represen-
tation, the multinomial distribution therefore takes the form


p(x|η)=

(
1+

M∑− 1

k=1

exp(ηk)

)− 1
exp(ηTx). (2.214)

This is the standard form of the exponential family, with parameter vectorη =
(η 1 ,...,ηM− 1 )Tin which


u(x)=x (2.215)
h(x)=1 (2.216)

g(η)=

(

1+

M∑− 1

k=1

exp(ηk)

)− 1

. (2.217)


Finally, let us consider the Gaussian distribution. For the univariate Gaussian,
we have


p(x|μ, σ^2 )=

1

(2πσ^2 )^1 /^2

exp

{

1

2 σ^2

(x−μ)^2

}
(2.218)

=

1

(2πσ^2 )^1 /^2

exp

{

1

2 σ^2

x^2 +

μ
σ^2

x−

1

2 σ^2

μ^2

}
(2.219)
Free download pdf