Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

128 Linear Predictors


9.4 Summary


The family of linear predictors is one of the most useful families of hypothesis
classes, and many learning algorithms that are being widely used in practice
rely on linear predictors. We have shown efficient algorithms for learning linear
predictors with respect to the zero-one loss in the separable case and with respect
to the squared and logistic losses in the unrealizable case. In later chapters we
will present the properties of the loss function that enable efficient learning.
Naturally, linear predictors are effective whenever we assume, as prior knowl-
edge, that some linear predictor attains low risk with respect to the underlying
distribution. In the next chapter we show how to construct nonlinear predictors
by composing linear predictors on top of simple classes. This will enable us to
employ linear predictors for a variety of prior knowledge assumptions.

9.5 Bibliographic Remarks


The Perceptron algorithm dates back to Rosenblatt (1958). The proof of its
convergence rate is due to (Agmon 1954, Novikoff 1962). Least Squares regression
goes back to Gauss (1795), Legendre (1805), and Adrain (1808).

9.6 Exercises



  1. Show how to cast the ERM problem of linear regression with respect to the
    absolute value loss function,`(h,(x,y)) =|h(x)−y|, as a linear program;
    namely, show how to write the problem


minw

∑m

i=1

|〈w,xi〉−yi|

as a linear program.
Hint:Start with proving that for anyc∈R,

|c| = mina≥ 0 a s.t. c≤aandc≥−a.


  1. Show that the matrixAdefined in Equation (9.6) is invertible if and only if
    x 1 ,...,xmspanRd.

  2. Show that Theorem 9.1 is tight in the following sense: For any positive integer
    m, there exist a vectorw∗∈Rd(for some appropriated) and a sequence of
    examples{(x 1 ,y 1 ),...,(xm,ym)}such that the following hold:

    • R= maxi‖xi‖≤1.

    • ‖w∗‖^2 =m, and for alli≤m,yi〈xi,w∗〉≥1. Note that, using the notation
      in Theorem 9.1, we therefore get




B= min{‖w‖:∀i∈[m], yi〈w,xi〉≥ 1 }≤


m.
Free download pdf