Understanding Machine Learning: From Theory to Algorithms

7.7 Bibliographic Remarks 97

7.7 Bibliographic Remarks

Our definition of nonuniform learnability is related to the definition of an Occam- algorithm in Blumer, Ehrenfeucht, Haussler & Warmuth (1987). The concept of SRM is due to (Vapnik & Chervonenkis 1974, Vapnik 1995). The concept of MDL is due to (Rissanen 1978, Rissanen 1983). The relation between SRM and MDL is discussed in Vapnik (1995). These notions are also closely related to the notion ofregularization(e.g. Tikhonov (1943)). We will elaborate on regularization in the second part of this book. The notion of consistency of estimators dates back to Fisher (1922). Our pre- sentation of consistency follows Steinwart & Christmann (2008), who also derived several no-free-lunch theorems.

7.8 Exercises

Prove that for any finite classH, and any description languaged:H →
{ 0 , 1 }∗, the VC-dimension ofHis at most 2 sup{|d(h)|:h∈ H}– the maxi-
mum description length of a predictor inH. Furthermore, ifdis a prefix-free
description then VCdim(H)≤sup{|d(h)|:h∈H}.

LetH={hn:n∈N}be an infinite countable hypothesis class for binary
classification. Show that it is impossible to assign weights to the hypotheses
inHsuch that

Hcould be learnt nonuniformly using these weights. That is, the weighting
functionw:H→[0,1] should satisfy the condition

∑

h∈Hw(h) ≤1.

The weights would be monotonically nondecreasing. That is, ifi < j, then
w(hi)≤w(hj).
3.• Consider a hypothesis classH=

⋃∞

n=1Hn, where for everyn∈N,Hnis finite. Find a weighting functionw:H→[0,1] such that

∑

h∈Hw(h)≤ 1 and so that for allh∈ H,w(h) is determined byn(h) = min{n:h∈ Hn}and by|Hn(h)|.

(*)Define such a functionwwhen for allnHnis countable (possibly
infinite).

LetHbe some hypothesis class. For anyh∈H, let|h|denote the description
length ofh, according to some fixed description language. Consider the MDL
learning paradigm in which the algorithm returns:

hS∈arg minh∈H

[

LS(h) +

√

|h|+ ln(2/δ) 2 m

]

,

whereSis a sample of sizem. For anyB >0, letHB={h∈ H:|h| ≤B}, and define h∗B= arg min h∈HB

LD(h).

Understanding Machine Learning: From Theory to Algorithms

7.7 Bibliographic Remarks

7.8 Exercises

∑

⋃∞

∑

[

√

]

,

Get our desktop app

Company

Features

Documentation

Resources