Understanding Machine Learning: From Theory to Algorithms

21.4 The Online Perceptron Algorithm 301

theorem21.15 The Online Gradient Descent algorithm enjoys the following regret bound for everyw?∈H,

RegretA(w?,T)≤‖w

?‖ 2

2 η

+η 2

∑T

t=1

‖vt‖^2.

If we further assume thatftisρ-Lipschitz for allt, then settingη= 1/

√

Tyields

RegretA(w?,T)≤^1 2

(‖w?‖^2 +ρ^2 )

√

T.

If we further assume thatHisB-bounded and we setη=ρB√T then

RegretA(H,T)≤B ρ

√

T.

Proof The analysis is similar to the analysis of Stochastic Gradient Descent with projections. Using the projection lemma, the definition ofw(t+

(^12) )
, and the
definition of subgradients, we have that for everyt,
‖w(t+1)−w?‖^2 −‖w(t)−w?‖^2
=‖w(t+1)−w?‖^2 −‖w(t+
(^12) )
−w?‖^2 +‖w(t+
(^12) )
−w?‖^2 −‖w(t)−w?‖^2
≤‖w(t+
(^12) )
−w?‖^2 −‖w(t)−w?‖^2
=‖w(t)−ηvt−w?‖^2 −‖w(t)−w?‖^2
=− 2 η〈w(t)−w?,vt〉+η^2 ‖vt‖^2
≤− 2 η(ft(w(t))−ft(w?)) +η^2 ‖vt‖^2.
Summing overtand observing that the left-hand side is a telescopic sum we
obtain that
‖w(T+1)−w?‖^2 −‖w(1)−w?‖^2 ≤− 2 η

∑T

t=1

(ft(w(t))−ft(w?)) +η^2

∑T

t=1

‖vt‖^2.

Rearranging the inequality and using the fact thatw(1)= 0 , we get that ∑T

t=1

(ft(w(t))−ft(w?))≤

‖w(1)−w?‖^2 −‖w(T+1)−w?‖^2 2 η +

η 2

∑T

t=1

‖vt‖^2

≤‖w

?‖ 2

2 η

+η 2

∑T

t=1

‖vt‖^2.

This proves the first bound in the theorem. The second bound follows from the assumption thatftisρ-Lipschitz, which implies that‖vt‖≤ρ.

21.4 The Online Perceptron Algorithm

The Perceptron is a classic online learning algorithm for binary classification with the hypothesis class of homogenous halfspaces, namely,H={x7→sign(〈w,x〉) :

Understanding Machine Learning: From Theory to Algorithms

?‖ 2

∑T

√

√

T.

√

T.

∑T

∑T

∑T

?‖ 2

∑T

21.4 The Online Perceptron Algorithm

Get our desktop app

Company

Features

Documentation

Resources