Understanding Machine Learning: From Theory to Algorithms

182 Regularization and Stability

Rd, letH=B, letZ=B ×{ 0 , 1 }d, and let`:Z×H →Rbe defined as follows:

`(w,(x,α)) =

∑d

i=1

αi(xi−wi)^2.

This problem corresponds to anunsupervisedlearning task, meaning that we do not try to predict the label ofx. Instead, what we try to do is to find the “center of mass” of the distribution overB. However, there is a twist, modeled by the vectorsα. Each example is a pair (x,α), wherexis the instancexand αindicates which features ofxare “active” and which are “turned off.” A hypothesis is a vectorwrepresenting the center of mass of the distribution, and the loss function is the squared Euclidean distance betweenxandw, but only with respect to the “active” elements ofx.

Show that this problem is learnable using the RLM rule with a sample
complexity that does not depend ond.

Consider a distributionDoverZas follows:xis fixed to be somex 0 , and
each element ofαis sampled to be either 1 or 0 with equal probability.
Show that the rate of uniform convergence of this problem grows with
d.
Hint:Letmbe a training set size. Show that ifd 2 m, then there is
a high probability of sampling a set of examples such that there exists
somej∈[d] for whichαj= 1 for all the examples in the training set.
Show that such a sample cannot be-representative. Conclude that the
sample complexity of uniform convergence must grow with log(d).

Conclude that if we takedto infinity we obtain a problem that is learnable
but for which the uniform convergence property does not hold. Compare
to the fundamental theorem of statistical learning.
3.Stability and Asymptotic ERM Are Sufficient for Learnability:
We say that a learning ruleAis anAERM (Asymptotic Empirical Risk
Minimizer)with rate(m) if for every distributionDit holds that

E

S∼Dm

[

LS(A(S))−min h∈H

LS(h)

]

≤(m).

We say that a learning ruleAlearns a classHwith rate(m) if for every distributionDit holds that

E S∼Dm

[

LD(A(S))−min h∈H

LD(h)

]

≤(m).

Prove the following:

theorem13.12 If a learning algorithmAis on-average-replace-one-stable with rate 1 (m)and is an AERM with rate 2 (m), then it learnsHwith rate 1 (m) + 2 (m).

Understanding Machine Learning: From Theory to Algorithms

E

[

]

[

]

Get our desktop app

Company

Features

Documentation

Resources