Understanding Machine Learning: From Theory to Algorithms

128 Linear Predictors

9.4 Summary

The family of linear predictors is one of the most useful families of hypothesis classes, and many learning algorithms that are being widely used in practice rely on linear predictors. We have shown efficient algorithms for learning linear predictors with respect to the zero-one loss in the separable case and with respect to the squared and logistic losses in the unrealizable case. In later chapters we will present the properties of the loss function that enable efficient learning. Naturally, linear predictors are effective whenever we assume, as prior knowledge, that some linear predictor attains low risk with respect to the underlying distribution. In the next chapter we show how to construct nonlinear predictors by composing linear predictors on top of simple classes. This will enable us to employ linear predictors for a variety of prior knowledge assumptions.

9.5 Bibliographic Remarks

The Perceptron algorithm dates back to Rosenblatt (1958). The proof of its convergence rate is due to (Agmon 1954, Novikoff 1962). Least Squares regression goes back to Gauss (1795), Legendre (1805), and Adrain (1808).

9.6 Exercises

Show how to cast the ERM problem of linear regression with respect to the
absolute value loss function,`(h,(x,y)) =|h(x)−y|, as a linear program;
namely, show how to write the problem

minw

∑m

i=1

|〈w,xi〉−yi|

as a linear program. Hint:Start with proving that for anyc∈R,

|c| = mina≥ 0 a s.t. c≤aandc≥−a.

Show that the matrixAdefined in Equation (9.6) is invertible if and only if
x 1 ,...,xmspanRd.

Show that Theorem 9.1 is tight in the following sense: For any positive integer
m, there exist a vectorw∗∈Rd(for some appropriated) and a sequence of
examples{(x 1 ,y 1 ),...,(xm,ym)}such that the following hold:

R= maxi‖xi‖≤1.

‖w∗‖^2 =m, and for alli≤m,yi〈xi,w∗〉≥1. Note that, using the notation
in Theorem 9.1, we therefore get

B= min{‖w‖:∀i∈[m], yi〈w,xi〉≥ 1 }≤

√

m.

Understanding Machine Learning: From Theory to Algorithms

9.4 Summary

9.5 Bibliographic Remarks

9.6 Exercises

√

Get our desktop app

Company

Features

Documentation

Resources