Pattern Recognition and Machine Learning

128 2. PROBABILITY DISTRIBUTIONS

Use this result to prove by induction the following result

(1 +x)N=

∑N

m=0

( N m

) xm (2.263)

which is known as thebinomial theorem, and which is valid for all real values ofx. Finally, show that the binomial distribution is normalized, so that

∑N

m=0

( N m

) μm(1−μ)N−m=1 (2.264)

which can be done by first pulling out a factor(1−μ)Nout of the summation and then making use of the binomial theorem.

2.4 ( ) Show that the mean of the binomial distribution is given by (2.11). To do this, differentiate both sides of the normalization condition (2.264) with respect toμand then rearrange to obtain an expression for the mean ofn. Similarly, by differentiating (2.264) twice with respect toμand making use of the result (2.11) for the mean of the binomial distribution prove the result (2.12) for the variance of the binomial.

2.5 ( ) www In this exercise, we prove that the beta distribution, given by (2.13), is correctly normalized, so that (2.14) holds. This is equivalent to showing that ∫ 1

0

μa−^1 (1−μ)b−^1 dμ=

Γ(a)Γ(b) Γ(a+b)

. (2.265)

From the definition (1.141) of the gamma function, we have

Γ(a)Γ(b)=

∫∞

0

exp(−x)xa−^1 dx

∫∞

0

exp(−y)yb−^1 dy. (2.266)

Use this expression to prove (2.265) as follows. First bring the integral overyinside the integrand of the integral overx, next make the change of variablet=y+x wherexis fixed, then interchange the order of thexandtintegrations, and finally make the change of variablex=tμwheretis fixed.

2.6 ( ) Make use of the result (2.265) to show that the mean, variance, and mode of the beta distribution (2.13) are given respectively by

E[μ]=

a a+b

(2.267)

var[μ]=

ab (a+b)^2 (a+b+1)

(2.268)

mode[μ]=

a− 1 a+b− 2

. (2.269)

Pattern Recognition and Machine Learning

128 2. PROBABILITY DISTRIBUTIONS

. (2.265)

(2.267)

(2.268)

. (2.269)

Get our desktop app

Company

Features

Documentation

Resources