14.8 Exercises 201
in the context ofstochastic optimization. See, for example, (Nemirovski & Yudin
1978, Nesterov & Nesterov 2004, Nesterov 2005, Nemirovski, Juditsky, Lan &
Shapiro 2009, Shapiro, Dentcheva & Ruszczy ́nski 2009).
The bound we have derived for strongly convex function is due to Hazan,
Agarwal & Kale (2007). As mentioned previously, improved bounds have been
obtained in Rakhlin et al. (2012).
14.8 Exercises
- Prove Claim 14.10.Hint:Extend the proof of Lemma 13.5.
- Prove Corollary 14.14.
3.Perceptron as a subgradient descent algorithm:LetS= ((x 1 , y 1 ),...,(xm, ym))∈
(Rd×{± 1 })m. Assume that there existsw∈Rdsuch that for everyi∈[m]
we haveyi〈w,xi〉 ≥1, and letw?be a vector that has the minimal norm
among all vectors that satisfy the preceding requirement. LetR= maxi‖xi‖.
Define a function
f(w) = max
i∈[m]
(1−yi〈w,xi〉).
- Show that minw:‖w‖≤‖w?‖f(w) = 0 and show that anywfor whichf(w)<
1 separates the examples inS.
- Show how to calculate a subgradient off.
- Describe and analyze the subgradient descent algorithm for this case. Com-
pare the algorithm and the analysis to the Batch Perceptron algorithm
given in Section 9.1.2.
4.Variable step size (*):Prove an analog of Theorem 14.8 for SGD with a
variable step size,ηt=ρB√t.