Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

344 Generative Models


24.1.1 Maximum Likelihood Estimation for Continuous Random Variables


LetXbe a continuous random variable. Then, for mostx∈Rwe haveP[X=
x] = 0 and therefore the definition of likelihood as given before is trivialized. To
overcome this technical problem we define the likelihood as log of thedensityof
the probability ofXatx. That is, given an i.i.d. training setS= (x 1 ,...,xm)
sampled according to a density distributionPθwe define the likelihood ofSgiven
θas

L(S;θ) = log

(m

i=1

Pθ(xi)

)

=

∑m

i=1

log(Pθ(xi)).

As before, the maximum likelihood estimator is a maximizer ofL(S;θ) with
respect toθ.
As an example, consider a Gaussian random variable, for which the density
function ofXis parameterized byθ= (μ,σ) and is defined as follows:

Pθ(x) =^1
σ


2 π

exp

(

−(x−μ)

2
2 σ^2

)

.

We can rewrite the likelihood as

L(S;θ) =−

1

2 σ^2

∑m

i=1

(xi−μ)^2 −mlog(σ


2 π).

To find a parameterθ= (μ,σ) that optimizes this we take the derivative of the
likelihood w.r.t.μand w.r.t.σand compare it to 0. We obtain the following two
equations:

d
dμL(S;θ) =

1

σ^2

∑m

i=1

(xi−μ) = 0

d

L(S;θ) =

1

σ^3

∑m

i=1

(xi−μ)^2 −

m
σ

= 0

Solving the preceding equations we obtain the maximum likelihood estimates:

μˆ=

1

m

∑m

i=1

xi and σˆ=

√√

√√ 1

m

∑m

i=1

(xi−μˆ)^2

Note that the maximum likelihood estimate is not always an unbiased estimator.
For example, while ˆμis unbiased, it is possible to show that the estimate ˆσof
the variance is biased (Exercise 1).

Simplifying Notation


To simplify our notation, we useP[X=x] in this chapter to describe both the
probability thatX=x(for discrete random variables) and the density of the
distribution atx(for continuous variables).
Free download pdf