Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

188 Stochastic Gradient Descent


Lemma 14.1 applies to the GD algorithm withvt=∇f(w(t)). As we will
show later in Lemma 14.7, iffisρ-Lipschitz, then‖∇f(w(t))‖≤ρ. We therefore
satisfy the lemma’s conditions and achieve the following corollary:

corollary14.2 Letfbe a convex,ρ-Lipschitz function, and letw?∈argmin{w:‖w‖≤B}f(w).
If we run the GD algorithm onfforTsteps withη=


B^2
ρ^2 T, then the output
vectorw ̄satisfies

f(w ̄)−f(w?) ≤ B ρ√
T
Furthermore, for every > 0 , to achievef(w ̄)−f(w?)≤, it suffices to run the
GD algorithm for a number of iterations that satisfies

T≥

B^2 ρ^2
^2

14.2 Subgradients


The GD algorithm requires that the functionfbe differentiable. We now gener-
alize the discussion beyond differentiable functions. We will show that the GD
algorithm can be applied to nondifferentiable functions by using a so-called sub-
gradient off(w) atw(t), instead of the gradient.
To motivate the definition of subgradients, recall that for a convex functionf,
the gradient atwdefines the slope of a tangent that lies belowf, that is,
∀u, f(u)≥f(w) +〈u−w,∇f(w)〉. (14.7)

An illustration is given on the left-hand side of Figure 14.2.
The existence of a tangent that lies belowfis an important property of convex
functions, which is in fact an alternative characterization of convexity.

lemma14.3 LetSbe an open convex set. A functionf:S→Ris convex iff
for everyw∈Sthere existsvsuch that

∀u∈S, f(u)≥f(w) +〈u−w,v〉. (14.8)

The proof of this lemma can be found in many convex analysis textbooks (e.g.,
(Borwein & Lewis 2006)). The preceding inequality leads us to the definition of
subgradients.

definition14.4 (Subgradients) A vectorvthat satisfies Equation (14.8) is
called asubgradientoffatw. The set of subgradients offatwis called the
differential setand denoted∂f(w).

An illustration of subgradients is given on the right-hand side of Figure 14.2.
For scalar functions, a subgradient of a convex functionfatwis a slope of a
line that touchesfatwand is not abovefelsewhere.
Free download pdf