Understanding Machine Learning: From Theory to Algorithms

8.1 Computational Complexity of Learning 101 number of bits in its representation). For machine learning tasks, the notion of an ...

102 The Runtime of Learning domain set, or some measures of the complexity of the hypothesis class with which the algorithm’s ou ...

8.2 Implementing the ERM Rule 103 overZ, and input, δ∈(0,1), whenAhas access to samples generated i.i.d. byD, Aterminates afte ...

104 The Runtime of Learning On a finite input sampleS∈Zmoutput someh∈Hthat minimizes the empirical loss, LS(h) =|S^1 | ∑ z∈S`(h, ...

8.2 Implementing the ERM Rule 105 8.2.2 Axis Aligned Rectangles LetHnbe the class of axis aligned rectangles inRn, namely, Hn={h ...

106 The Runtime of Learning the rectangle with the minimal training error. This procedure is guaranteed to find an ERM hypothesi ...

8.3 Efficiently Learnable, but Not by a Proper ERM 107 Not Efficiently Learnable in the Agnostic Case As in the case of axis ali ...

108 The Runtime of Learning algorithm might return a hypothesis that does not belong to the original hypothesis class; hence t ...

8.4 Hardness of Learning* 109 to some partial information about it. On that high level intuitive sense, results about the crypto ...

110 The Runtime of Learning the class of functions that can be calculated by small Boolean circuits is not efficiently learnable ...

8.7 Exercises 111 assume that such a hypothesis can be calculated given theseO(n) examples in timeO(n), and that the empirical r ...

112 The Runtime of Learning We wish to reduce thek-coloring problem toERMHnk: that is, to prove that if there is an algorithm th ...

8.7 Exercises 113 larger than RP. In particular, it is believed that NP-hard problems cannot be solved by a randomized polynomia ...

Part II From Theory to Algorithms ...

9 Linear Predictors In this chapter we will study the family of linear predictors, one of the most useful families of hypothesis ...

118 Linear Predictors Rd+1. Therefore, hw,b(x) =〈w,x〉+b=〈w′,x′〉. It follows that each affine function inRdcan be rewritten as a ...

9.1 Halfspaces 119 in the nonseparable case (i.e., the agnostic case) is known to be computationally hard (Ben-David & Simon ...

120 Linear Predictors byyi. That is,Ai,j=yixi,j, wherexi,jis thej’th element of the vectorxi. Let vbe the vector (1,...,1)∈Rm. T ...

«
1
2
3
4
5
6
7
8
9
10
»

Free download pdf

Get our desktop app

Company

Features

Documentation

Resources