Understanding Machine Learning: From Theory to Algorithms

5.1 The No-Free-Lunch Theorem 61 agnostic PAC model, in which we require that the risk of the output hypothesis will not be much ...

62 The Bias-Complexity Tradeoff C×{ 0 , 1 }defined by Di({(x,y)}) = { 1 /|C| ify=fi(x) 0 otherwise. That is, the probability to ...

5.1 The No-Free-Lunch Theorem 63 functionh:C→{ 0 , 1 }and everyiwe have LDi(h) =^1 2 m ∑ x∈C (^1) [h(x) 6 =fi(x)] ≥ 1 2 m ∑p r=1 ...

64 The Bias-Complexity Tradeoff Proof Assume, by way of contradiction, that the class is learnable. Choose some < 1 /8 andδ ...

5.3 Summary 65 The Estimation Error– the difference between the approximation error and the error achieved by the ERM predictor ...

66 The Bias-Complexity Tradeoff be small. In the next chapter we will study in more detail the behavior of the estimation error. ...

6 The VC-Dimension In the previous chapter, we decomposed the error of the ERMHrule into approximation error and estimation er ...

68 The VC-Dimension size. Nevertheless, the following lemma shows thatHis learnable in the PAC model using the ERM algorithm. Le ...

6.2 The VC-Dimension 69 restricting the hypothesis class, for any learning algorithm, an adversary can construct a distribution ...

70 The VC-Dimension Corollary 6.4 tells us that ifHshatters some setCof size 2mthen we cannot learnHusingmexamples. Intuitively, ...

6.3 Examples 71 6.3.2 Intervals LetHbe the class of intervals overR, namely,H={ha,b:a,b∈R,a < b}, whereha,b:R→ { 0 , 1 }is a ...

72 The VC-Dimension 6.3.4 Finite Classes LetHbe a finite class. Then, clearly, for any setCwe have|HC|≤|H|and thusC cannot be sh ...

6.5 Proof of Theorem 6.7 73 1.Hhas the uniform convergence property with sample complexity C 1 d+ log(1/δ) ^2 ≤mUCH(,δ)≤C 2 d+ ...

74 The VC-Dimension definition6.9 (Growth Function) LetHbe a hypothesis class. Then the growth functionofH, denotedτH:N→N, is de ...

6.5 Proof of Theorem 6.7 75 Next, defineH′⊆Hto be H′={h∈H:∃h′∈Hs.t. (1−h′(c 1 ),h′(c 2 ),...,h′(cm)) = (h(c 1 ),h(c 2 ),...,h(cm ...

76 The VC-Dimension To ensure that the preceding is at mostwe need that m≥ 2 dlog(m) (δ)^2 + 2 dlog(2e/d) (δ)^2 . Standard al ...

6.5 Proof of Theorem 6.7 77 The expectation on the right-hand side is over a choice of two i.i.d. samples S=z 1 ,...,zmandS′=z 1 ...

78 The VC-Dimension 6.6 Summary The fundamental theorem of learning theory characterizes PAC learnability of classes of binary c ...

6.8 Exercises 79 2.Hat−most−k={h∈{ 0 , 1 }X:|{x:h(x) = 1}|≤kor|{x:h(x) = 0}|≤k}. LetXbe the Boolean hypercube{ 0 , 1 }n. For a ...

80 The VC-Dimension As inHdcon, the empty conjunction is interpreted as the all-positive hypothesis. We augmentHdmconwith the ...

«
1
2
3
4
5
6
7
8
9
10
»

Free download pdf

Get our desktop app

Company

Features

Documentation

Resources