Understanding Machine Learning: From Theory to Algorithms

356 Generative Models

24.8 Exercises

Prove that the maximum likelihood estimator of the variance of a Gaussian
variable is biased.

Regularization for Maximum Likelihood: Consider the following regularized
loss minimization:
1
m

∑m

i=1

log(1/Pθ[xi]) +^1 m

(log(1/θ) + log(1/(1−θ))).

Show that the preceding objective is equivalent to the usual empirical error
had we added two pseudoexamples to the training set. Conclude that
the regularized maximum likelihood estimator would be

θˆ=^1 m+ 2

(

1 +

∑m

i=1

xi

)

.

Derive a high probability bound on|θˆ−θ?|.Hint: Rewrite this as|θˆ−E[θˆ]+
E[θˆ]−θ?|and then use the triangle inequality and Hoeffding inequality.

Use this to bound the true risk. Hint: Use the fact that nowθˆ≥m^1 +2to
relate|θˆ−θ?|to the relative entropy.
3.• Consider a general optimization problem of the form:

maxc

∑k

y=1

νylog(cy) s.t. cy> 0 ,

∑

y

cy= 1,

whereν∈Rk+is a vector of nonnegative weights. Verify that the M step of softk-means involves solving such an optimization problem.

Letc?=∑y^1 νyν. Show thatc?is a probability vector.

Show that the optimization problem is equivalent to the problem:
minc DRE(c?||c) s.t. cy> 0 ,

∑

y

cy= 1.

Using properties of the relative entropy, conclude thatc?is the solution to
the optimization problem.

Understanding Machine Learning: From Theory to Algorithms

24.8 Exercises

(

1 +

)

.

∑

∑

Get our desktop app

Company

Features

Documentation

Resources