344 Generative Models
24.1.1 Maximum Likelihood Estimation for Continuous Random Variables
LetXbe a continuous random variable. Then, for mostx∈Rwe haveP[X=
x] = 0 and therefore the definition of likelihood as given before is trivialized. To
overcome this technical problem we define the likelihood as log of thedensityof
the probability ofXatx. That is, given an i.i.d. training setS= (x 1 ,...,xm)
sampled according to a density distributionPθwe define the likelihood ofSgiven
θasL(S;θ) = log(m
∏i=1Pθ(xi))
=
∑mi=1log(Pθ(xi)).As before, the maximum likelihood estimator is a maximizer ofL(S;θ) with
respect toθ.
As an example, consider a Gaussian random variable, for which the density
function ofXis parameterized byθ= (μ,σ) and is defined as follows:Pθ(x) =^1
σ√
2 πexp(
−(x−μ)2
2 σ^2)
.
We can rewrite the likelihood asL(S;θ) =−1
2 σ^2∑mi=1(xi−μ)^2 −mlog(σ√
2 π).To find a parameterθ= (μ,σ) that optimizes this we take the derivative of the
likelihood w.r.t.μand w.r.t.σand compare it to 0. We obtain the following two
equations:d
dμL(S;θ) =1
σ^2∑mi=1(xi−μ) = 0d
dσL(S;θ) =1
σ^3∑mi=1(xi−μ)^2 −m
σ= 0
Solving the preceding equations we obtain the maximum likelihood estimates:μˆ=1
m∑mi=1xi and σˆ=√√
√√ 1
m∑mi=1(xi−μˆ)^2Note that the maximum likelihood estimate is not always an unbiased estimator.
For example, while ˆμis unbiased, it is possible to show that the estimate ˆσof
the variance is biased (Exercise 1).Simplifying Notation
To simplify our notation, we useP[X=x] in this chapter to describe both the
probability thatX=x(for discrete random variables) and the density of the
distribution atx(for continuous variables).