Pattern Recognition and Machine Learning

2.4. The Exponential Family 115

Making use of the constraint (2.209), the multinomial distribution in this representa-
tion then becomes

exp

{M ∑

k=1

xklnμk

}

=exp

{M− 1 ∑

k=1

xklnμk+

( 1 −

M∑− 1

k=1

xk

) ln

( 1 −

M∑− 1

k=1

μk

)}

=exp

{M− 1 ∑

k=1

xkln

( μk 1 −

∑M− 1 j=1 μj

)

+ln

(

1 −

M∑− 1

k=1

μk

)}

. (2.211)

We now identify

ln

( μk 1 −

∑ jμj

) =ηk (2.212)

which we can solve forμkby first summing both sides overkand then rearranging
and back-substituting to give

μk=

exp(ηk) 1+

∑ jexp(ηj)

. (2.213)

This is called thesoftmaxfunction, or thenormalized exponential. In this represen-
tation, the multinomial distribution therefore takes the form

p(x|η)=

( 1+

M∑− 1

k=1

exp(ηk)

)− 1 exp(ηTx). (2.214)

This is the standard form of the exponential family, with parameter vectorη =
(η 1 ,...,ηM− 1 )Tin which

u(x)=x (2.215) h(x)=1 (2.216)

g(η)=

(

1+

M∑− 1

k=1

exp(ηk)

)− 1

. (2.217)

Finally, let us consider the Gaussian distribution. For the univariate Gaussian,
we have

p(x|μ, σ^2 )=

1

(2πσ^2 )^1 /^2

exp

{ −

1

2 σ^2

(x−μ)^2

} (2.218)

=

1

(2πσ^2 )^1 /^2

exp

{ −

1

2 σ^2

x^2 +

μ σ^2

x−

1

2 σ^2

μ^2

} (2.219)

Pattern Recognition and Machine Learning

. (2.213)

1

1

=

1

1

1

Get our desktop app

Company

Features

Documentation

Resources