Understanding Machine Learning: From Theory to Algorithms
9.1 Halfspaces 121 yi〈w?,xi〉≥1 for alli, and among all vectors that satisfy these constraints,w? is of minimal norm. The idea of ...
122 Linear Predictors Remark 9.1 The Perceptron is simple to implement and is guaranteed to con- verge. However, the convergence ...
9.2 Linear Regression 123 rr r r r r r r r r r Figure 9.1Linear regression ford= 1. For instance, the x-axis may denote the age ...
124 Linear Predictors In the next subsection, we will see how to implement the ERM rule for linear regression with respect to th ...
9.2 Linear Regression 125 Or, in matrix form: A= x 1 ... xm x 1 ... xm > , (9.7) b= x 1 .. ...
126 Linear Predictors We will focus here on the class of one dimensional,n-degree, polynomial re- gression predictors, namely, H ...
9.3 Logistic Regression 127 The hypothesis class is therefore (where for simplicity we are using homogenous linear functions): H ...
128 Linear Predictors 9.4 Summary The family of linear predictors is one of the most useful families of hypothesis classes, and ...
9.6 Exercises 129 Thus, (BR)^2 ≤m. When running the Perceptron on this sequence of examples it makesm updates before converging ...
10 Boosting Boosting is an algorithmic paradigm that grew out of a theoretical question and became a very practical machine lear ...
10.1 Weak Learnability 131 by Kearns and Valiant in 1988 and solved in 1990 by Robert Schapire, then a graduate student at MIT. ...
132 Boosting This definition is almost identical to the definition of PAC learning, which here we will callstrong learning, with ...
10.1 Weak Learnability 133 To see that, we first show that for every distribution that is consistent with H, there exists a deci ...
134 Boosting thresholdθ. Therefore, instead of minimizing overθ∈Rwe can minimize over θ∈Θj. This already gives us an efficient p ...
10.2 AdaBoost 135 a distribution over the examples inS, denotedD(t). That is,D(t)∈Rm+and ∑m i=1D (t) i = 1. Then, the booster pa ...
136 Boosting isfT. In addition, denote Zt= 1 m ∑m i=1 e−yift(xi). Note that for any hypothesis we have that (^1) [h(x) 6 =y]≤e−y ...
10.3 Linear Combinations of Base Hypotheses 137 Finally, using the inequality 1−a≤e−awe have that √ 1 − 4 γ^2 ≤e−^4 γ (^2) / 2 e ...
138 Boosting (h 1 (x),...,hT(x))∈RT, and then applying the (homogenous) halfspace defined bywonψ(x). In this section we analyze ...
10.3 Linear Combinations of Base Hypotheses 139 From this example we obtain thatL(HDS1,T) can shatter any set ofT+ 1 instances i ...
140 Boosting A B C D Figure 10.1The four types of functions,g, used by the base hypotheses for face recognition. The value ofgfo ...
«
2
3
4
5
6
7
8
9
10
11
»
Free download pdf