Pattern Recognition and Machine Learning

(Jeff_L) #1
128 2. PROBABILITY DISTRIBUTIONS

Use this result to prove by induction the following result

(1 +x)N=

∑N

m=0

(
N
m

)
xm (2.263)

which is known as thebinomial theorem, and which is valid for all real values ofx.
Finally, show that the binomial distribution is normalized, so that

∑N

m=0

(
N
m

)
μm(1−μ)N−m=1 (2.264)

which can be done by first pulling out a factor(1−μ)Nout of the summation and
then making use of the binomial theorem.

2.4 ( ) Show that the mean of the binomial distribution is given by (2.11). To do this,
differentiate both sides of the normalization condition (2.264) with respect toμand
then rearrange to obtain an expression for the mean ofn. Similarly, by differentiating
(2.264) twice with respect toμand making use of the result (2.11) for the mean of
the binomial distribution prove the result (2.12) for the variance of the binomial.

2.5 ( ) www In this exercise, we prove that the beta distribution, given by (2.13), is
correctly normalized, so that (2.14) holds. This is equivalent to showing that
∫ 1

0

μa−^1 (1−μ)b−^1 dμ=

Γ(a)Γ(b)
Γ(a+b)

. (2.265)

From the definition (1.141) of the gamma function, we have

Γ(a)Γ(b)=

∫∞

0

exp(−x)xa−^1 dx

∫∞

0

exp(−y)yb−^1 dy. (2.266)

Use this expression to prove (2.265) as follows. First bring the integral overyinside
the integrand of the integral overx, next make the change of variablet=y+x
wherexis fixed, then interchange the order of thexandtintegrations, and finally
make the change of variablex=tμwheretis fixed.

2.6 ( ) Make use of the result (2.265) to show that the mean, variance, and mode of the
beta distribution (2.13) are given respectively by

E[μ]=

a
a+b

(2.267)

var[μ]=

ab
(a+b)^2 (a+b+1)

(2.268)

mode[μ]=

a− 1
a+b− 2

. (2.269)
Free download pdf