Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
7.7 Bibliographic Remarks 97

7.7 Bibliographic Remarks


Our definition of nonuniform learnability is related to the definition of an Occam-
algorithm in Blumer, Ehrenfeucht, Haussler & Warmuth (1987). The concept of
SRM is due to (Vapnik & Chervonenkis 1974, Vapnik 1995). The concept of MDL
is due to (Rissanen 1978, Rissanen 1983). The relation between SRM and MDL
is discussed in Vapnik (1995). These notions are also closely related to the notion
ofregularization(e.g. Tikhonov (1943)). We will elaborate on regularization in
the second part of this book.
The notion of consistency of estimators dates back to Fisher (1922). Our pre-
sentation of consistency follows Steinwart & Christmann (2008), who also derived
several no-free-lunch theorems.

7.8 Exercises



  1. Prove that for any finite classH, and any description languaged:H →
    { 0 , 1 }∗, the VC-dimension ofHis at most 2 sup{|d(h)|:h∈ H}– the maxi-
    mum description length of a predictor inH. Furthermore, ifdis a prefix-free
    description then VCdim(H)≤sup{|d(h)|:h∈H}.

  2. LetH={hn:n∈N}be an infinite countable hypothesis class for binary
    classification. Show that it is impossible to assign weights to the hypotheses
    inHsuch that

    • Hcould be learnt nonuniformly using these weights. That is, the weighting
      functionw:H→[0,1] should satisfy the condition





h∈Hw(h) ≤1.


  • The weights would be monotonically nondecreasing. That is, ifi < j, then
    w(hi)≤w(hj).
    3.• Consider a hypothesis classH=


⋃∞

n=1Hn, where for everyn∈N,Hnis
finite. Find a weighting functionw:H→[0,1] such that


h∈Hw(h)≤
1 and so that for allh∈ H,w(h) is determined byn(h) = min{n:h∈
Hn}and by|Hn(h)|.


  • (*)Define such a functionwwhen for allnHnis countable (possibly
    infinite).



  1. LetHbe some hypothesis class. For anyh∈H, let|h|denote the description
    length ofh, according to some fixed description language. Consider the MDL
    learning paradigm in which the algorithm returns:


hS∈arg minh∈H

[

LS(h) +


|h|+ ln(2/δ)
2 m

]

,

whereSis a sample of sizem. For anyB >0, letHB={h∈ H:|h| ≤B},
and define
h∗B= arg min
h∈HB

LD(h).
Free download pdf