Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
14.8 Exercises 201

in the context ofstochastic optimization. See, for example, (Nemirovski & Yudin
1978, Nesterov & Nesterov 2004, Nesterov 2005, Nemirovski, Juditsky, Lan &
Shapiro 2009, Shapiro, Dentcheva & Ruszczy ́nski 2009).
The bound we have derived for strongly convex function is due to Hazan,
Agarwal & Kale (2007). As mentioned previously, improved bounds have been
obtained in Rakhlin et al. (2012).

14.8 Exercises



  1. Prove Claim 14.10.Hint:Extend the proof of Lemma 13.5.

  2. Prove Corollary 14.14.
    3.Perceptron as a subgradient descent algorithm:LetS= ((x 1 , y 1 ),...,(xm, ym))∈
    (Rd×{± 1 })m. Assume that there existsw∈Rdsuch that for everyi∈[m]
    we haveyi〈w,xi〉 ≥1, and letw?be a vector that has the minimal norm
    among all vectors that satisfy the preceding requirement. LetR= maxi‖xi‖.
    Define a function
    f(w) = max
    i∈[m]


(1−yi〈w,xi〉).


  • Show that minw:‖w‖≤‖w?‖f(w) = 0 and show that anywfor whichf(w)<
    1 separates the examples inS.

  • Show how to calculate a subgradient off.

  • Describe and analyze the subgradient descent algorithm for this case. Com-
    pare the algorithm and the analysis to the Batch Perceptron algorithm
    given in Section 9.1.2.
    4.Variable step size (*):Prove an analog of Theorem 14.8 for SGD with a
    variable step size,ηt=ρB√t.

Free download pdf