21.4 The Online Perceptron Algorithm 303function and we can takevt= 0. Otherwise, it is easy to verify thatvt=−ytxt
is in∂ft(w(t)). We therefore obtain the update rule
w(t+1)={
w(t) ifyt〈w(t),xt〉> 0
w(t)+ηytxt otherwiseDenote byMthe set of rounds in which sign(〈w(t),xt〉) 6 =yt. Note that on
roundt, the prediction of the Perceptron can be rewritten as
pt= sign(〈w(t),xt〉) = sign(
η∑
i∈M:i<tyi〈xi,xt〉)
.
This form implies that the predictions of the Perceptron algorithm and the set
Mdo not depend on the actual value ofηas long asη >0. We have therefore
obtained the Perceptron algorithm:
Perceptron
initialize:w 1 = 0
for t= 1, 2 ,...,T
receivext
predictpt= sign(〈w(t),xt〉)
ifyt〈w(t),xt〉≤ 0
w(t+1)=w(t)+ytxt
else
w(t+1)=w(t)To analyze the Perceptron, we rely on the analysis of Online Gradient De-
scent given in the previous section. In our case, the subgradient offtwe use
in the Perceptron isvt=− (^1) [yt〈w(t),xt〉≤0]ytxt. Indeed, the Perceptron’s update
isw(t+1) =w(t)−vt, and as discussed before this is equivalent tow(t+1)=
w(t)−ηvtfor everyη >0. Therefore, Theorem 21.15 tells us that
∑T
t=1
ft(w(t))−
∑T
t=1ft(w?)≤1
2 η‖w?‖^22 +η
2∑T
t=1‖vt‖^22.Sinceft(w(t)) is a surrogate for the 0−1 loss we know that
∑T
t=1ft(w(t))≥|M|.DenoteR= maxt‖xt‖; then we obtain
|M|−
∑T
t=1ft(w?)≤1
2 η‖w?‖^22 +η
2|M|R^2
Settingη= ‖w
?‖
R√
|M|and rearranging, we obtain|M|−R‖w?‖√
|M|−
∑T
t=1ft(w?)≤ 0. (21.6)This inequality implies
