Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
21.4 The Online Perceptron Algorithm 301

theorem21.15 The Online Gradient Descent algorithm enjoys the following
regret bound for everyw?∈H,

RegretA(w?,T)≤‖w

?‖ 2

2 η


2

∑T

t=1

‖vt‖^2.

If we further assume thatftisρ-Lipschitz for allt, then settingη= 1/


Tyields

RegretA(w?,T)≤^1
2

(‖w?‖^2 +ρ^2 )


T.

If we further assume thatHisB-bounded and we setη=ρB√T then

RegretA(H,T)≤B ρ


T.

Proof The analysis is similar to the analysis of Stochastic Gradient Descent
with projections. Using the projection lemma, the definition ofw(t+

(^12) )
, and the
definition of subgradients, we have that for everyt,
‖w(t+1)−w?‖^2 −‖w(t)−w?‖^2
=‖w(t+1)−w?‖^2 −‖w(t+
(^12) )
−w?‖^2 +‖w(t+
(^12) )
−w?‖^2 −‖w(t)−w?‖^2
≤‖w(t+
(^12) )
−w?‖^2 −‖w(t)−w?‖^2
=‖w(t)−ηvt−w?‖^2 −‖w(t)−w?‖^2
=− 2 η〈w(t)−w?,vt〉+η^2 ‖vt‖^2
≤− 2 η(ft(w(t))−ft(w?)) +η^2 ‖vt‖^2.
Summing overtand observing that the left-hand side is a telescopic sum we
obtain that
‖w(T+1)−w?‖^2 −‖w(1)−w?‖^2 ≤− 2 η


∑T

t=1

(ft(w(t))−ft(w?)) +η^2

∑T

t=1

‖vt‖^2.

Rearranging the inequality and using the fact thatw(1)= 0 , we get that
∑T

t=1

(ft(w(t))−ft(w?))≤

‖w(1)−w?‖^2 −‖w(T+1)−w?‖^2
2 η +

η
2

∑T

t=1

‖vt‖^2

≤‖w

?‖ 2

2 η


2

∑T

t=1

‖vt‖^2.

This proves the first bound in the theorem. The second bound follows from the
assumption thatftisρ-Lipschitz, which implies that‖vt‖≤ρ.

21.4 The Online Perceptron Algorithm


The Perceptron is a classic online learning algorithm for binary classification with
the hypothesis class of homogenous halfspaces, namely,H={x7→sign(〈w,x〉) :
Free download pdf