21.4 The Online Perceptron Algorithm 301
theorem21.15 The Online Gradient Descent algorithm enjoys the following
regret bound for everyw?∈H,
RegretA(w?,T)≤‖w
?‖ 2
2 η
+η
2
∑T
t=1
‖vt‖^2.
If we further assume thatftisρ-Lipschitz for allt, then settingη= 1/
√
Tyields
RegretA(w?,T)≤^1
2
(‖w?‖^2 +ρ^2 )
√
T.
If we further assume thatHisB-bounded and we setη=ρB√T then
RegretA(H,T)≤B ρ
√
T.
Proof The analysis is similar to the analysis of Stochastic Gradient Descent
with projections. Using the projection lemma, the definition ofw(t+
(^12) )
, and the
definition of subgradients, we have that for everyt,
‖w(t+1)−w?‖^2 −‖w(t)−w?‖^2
=‖w(t+1)−w?‖^2 −‖w(t+
(^12) )
−w?‖^2 +‖w(t+
(^12) )
−w?‖^2 −‖w(t)−w?‖^2
≤‖w(t+
(^12) )
−w?‖^2 −‖w(t)−w?‖^2
=‖w(t)−ηvt−w?‖^2 −‖w(t)−w?‖^2
=− 2 η〈w(t)−w?,vt〉+η^2 ‖vt‖^2
≤− 2 η(ft(w(t))−ft(w?)) +η^2 ‖vt‖^2.
Summing overtand observing that the left-hand side is a telescopic sum we
obtain that
‖w(T+1)−w?‖^2 −‖w(1)−w?‖^2 ≤− 2 η
∑T
t=1
(ft(w(t))−ft(w?)) +η^2
∑T
t=1
‖vt‖^2.
Rearranging the inequality and using the fact thatw(1)= 0 , we get that
∑T
t=1
(ft(w(t))−ft(w?))≤
‖w(1)−w?‖^2 −‖w(T+1)−w?‖^2
2 η +
η
2
∑T
t=1
‖vt‖^2
≤‖w
?‖ 2
2 η
+η
2
∑T
t=1
‖vt‖^2.
This proves the first bound in the theorem. The second bound follows from the
assumption thatftisρ-Lipschitz, which implies that‖vt‖≤ρ.
21.4 The Online Perceptron Algorithm
The Perceptron is a classic online learning algorithm for binary classification with
the hypothesis class of homogenous halfspaces, namely,H={x7→sign(〈w,x〉) :