Understanding Machine Learning: From Theory to Algorithms

200 Stochastic Gradient Descent

thatH=Rdand therefore the projection step does not matter) as follows

w(t+1)=w(t)−

1

λt

(

λw(t)+vt

)

=

(

1 −

1

t

)

w(t)−

1

λt

vt

=

t− 1 t w

(t)−^1 λtvt =

t− 1 t

(

t− 2 t− 1

w(t−1)−

1

λ(t−1)

vt− 1

)

−

1

λt

vt

=−

1

λt

∑t

i=1

vi. (14.15)

If we assume that the loss function isρ-Lipschitz, it follows that for alltwe have ‖vt‖≤ρand therefore‖λw(t)‖≤ρ, which yields

‖λw(t)+vt‖≤ 2 ρ.

Theorem 14.11 therefore tells us that after performingTiterations we have that

E[f(w ̄)]−f(w?)≤

4 ρ^2 λT

(1 + log(T)).

14.6 Summary

We have introduced the Gradient Descent and Stochastic Gradient Descent algorithms, along with several of their variants. We have analyzed their convergence rate and calculated the number of iterations that would guarantee an expected objective of at mostplus the optimal objective. Most importantly, we have shown that by using SGD we can directly minimize the risk function. We do so by sampling a point i.i.d fromDand using a subgradient of the loss of the current hypothesisw(t)at this point as an unbiased estimate of the gradient (or a subgradient) of the risk function. This implies that a bound on the number of iterations also yields a sample complexity bound. Finally, we have also shown how to apply the SGD method to the problem of regularized risk minimization. In future chapters we show how this yields extremely simple solvers to some optimization problems associated with regularized risk minimization.

14.7 Bibliographic Remarks

SGD dates back to Robbins & Monro (1951). It is especially effective in large scale machine learning problems. See, for example, (Murata 1998, Le Cun 2004, Zhang 2004, Bottou & Bousquet 2008, Shalev-Shwartz, Singer & Srebro 2007, Shalev-Shwartz & Srebro 2008). In the optimization community it was studied

Understanding Machine Learning: From Theory to Algorithms

1

(

)

=

(

1 −

1

)

1

=

(

1

)

−

1

=−

1

14.6 Summary

14.7 Bibliographic Remarks

Get our desktop app

Company

Features

Documentation

Resources