# Pattern Recognition and Machine Learning

(Jeff_L) #1
``2.4. The Exponential Family 115``

Making use of the constraint (2.209), the multinomial distribution in this representa-
tion then becomes

``exp``

``````{M
∑``````

``k=1``

``xklnμk``

``}``

``=exp``

``````{M− 1
∑``````

``k=1``

``xklnμk+``

``````(
1 −``````

``M∑− 1``

``k=1``

``xk``

``````)
ln``````

``````(
1 −``````

``M∑− 1``

``k=1``

``μk``

``)}``

``=exp``

``````{M− 1
∑``````

``k=1``

``xkln``

``````(
μk
1 −``````

``````∑M− 1
j=1 μj``````

``)``

``+ln``

``(``

``1 −``

``M∑− 1``

``k=1``

``μk``

``)}``

. (2.211)

We now identify

``ln``

``````(
μk
1 −``````

``````∑
jμj``````

``````)
=ηk (2.212)``````

which we can solve forμkby first summing both sides overkand then rearranging
and back-substituting to give

``μk=``

``````exp(ηk)
1+``````

``````∑
jexp(ηj)``````

##### . (2.213)

This is called thesoftmaxfunction, or thenormalized exponential. In this represen-
tation, the multinomial distribution therefore takes the form

``p(x|η)=``

``````(
1+``````

``M∑− 1``

``k=1``

``exp(ηk)``

``````)− 1
exp(ηTx). (2.214)``````

This is the standard form of the exponential family, with parameter vectorη =
(η 1 ,...,ηM− 1 )Tin which

``````u(x)=x (2.215)
h(x)=1 (2.216)``````

``g(η)=``

``(``

``1+``

``M∑− 1``

``k=1``

``exp(ηk)``

``)− 1``

. (2.217)

Finally, let us consider the Gaussian distribution. For the univariate Gaussian,
we have

``p(x|μ, σ^2 )=``

##### 1

``(2πσ^2 )^1 /^2``

``exp``

``````{
−``````

##### 1

``2 σ^2``

``(x−μ)^2``

``````}
(2.218)``````

##### 1

``(2πσ^2 )^1 /^2``

``exp``

``````{
−``````

##### 1

``2 σ^2``

``x^2 +``

``````μ
σ^2``````

``x−``

##### 1

``2 σ^2``

``μ^2``

``````}
(2.219)``````