Understanding Machine Learning: From Theory to Algorithms

14.8 Exercises 201

in the context ofstochastic optimization. See, for example, (Nemirovski & Yudin 1978, Nesterov & Nesterov 2004, Nesterov 2005, Nemirovski, Juditsky, Lan & Shapiro 2009, Shapiro, Dentcheva & Ruszczy ́nski 2009). The bound we have derived for strongly convex function is due to Hazan, Agarwal & Kale (2007). As mentioned previously, improved bounds have been obtained in Rakhlin et al. (2012).

14.8 Exercises

Prove Claim 14.10.Hint:Extend the proof of Lemma 13.5.

Prove Corollary 14.14.
3.Perceptron as a subgradient descent algorithm:LetS= ((x 1 , y 1 ),...,(xm, ym))∈
(Rd×{± 1 })m. Assume that there existsw∈Rdsuch that for everyi∈[m]
we haveyi〈w,xi〉 ≥1, and letw?be a vector that has the minimal norm
among all vectors that satisfy the preceding requirement. LetR= maxi‖xi‖.
Define a function
f(w) = max
i∈[m]

(1−yi〈w,xi〉).

Show that minw:‖w‖≤‖w?‖f(w) = 0 and show that anywfor whichf(w)<
1 separates the examples inS.

Show how to calculate a subgradient off.

Describe and analyze the subgradient descent algorithm for this case. Com-
pare the algorithm and the analysis to the Batch Perceptron algorithm
given in Section 9.1.2.
4.Variable step size (*):Prove an analog of Theorem 14.8 for SGD with a
variable step size,ηt=ρB√t.

Understanding Machine Learning: From Theory to Algorithms

14.8 Exercises

Get our desktop app

Company

Features

Documentation

Resources