Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
9.1 Halfspaces 121

yi〈w?,xi〉≥1 for alli, and among all vectors that satisfy these constraints,w?
is of minimal norm.
The idea of the proof is to show that after performingTiterations, the cosine
of the angle betweenw?andw(T+1)is at least


√T
RB. That is, we will show that
〈w?,w(T+1)〉
‖w?‖‖w(T+1)‖



T

RB

. (9.2)

By the Cauchy-Schwartz inequality, the left-hand side of Equation (9.2) is at
most 1. Therefore, Equation (9.2) would imply that


1 ≥


T

RB

⇒ T≤(RB)^2 ,

which will conclude our proof.
To show that Equation (9.2) holds, we first show that〈w?,w(T+1)〉 ≥T.
Indeed, at the first iteration,w(1)= (0,...,0) and therefore〈w?,w(1)〉= 0,
while on iterationt, if we update using example (xi,yi) we have that


〈w?,w(t+1)〉−〈w?,w(t)〉=〈w?,w(t+1)−w(t)〉
=〈w?,yixi〉=yi〈w?,xi〉
≥ 1.

Therefore, after performingTiterations, we get:


〈w?,w(T+1)〉=

∑T

t=1

(

〈w?,w(t+1)〉−〈w?,w(t)〉

)

≥T, (9.3)

as required.
Next, we upper bound‖w(T+1)‖. For each iterationtwe have that


‖w(t+1)‖^2 =‖w(t)+yixi‖^2
=‖w(t)‖^2 + 2yi〈w(t),xi〉+yi^2 ‖xi‖^2
≤‖w(t)‖^2 +R^2 (9.4)

where the last inequality is due to the fact that exampleiis necessarily such
thatyi〈w(t),xi〉≤0, and the norm ofxiis at mostR. Now, since‖w(1)‖^2 = 0,
if we use Equation (9.4) recursively forTiterations, we obtain that


‖w(T+1)‖^2 ≤TR^2 ⇒ ‖w(T+1)‖≤


TR. (9.5)

Combining Equation (9.3) with Equation (9.5), and using the fact that‖w?‖=
B, we obtain that


〈w(T+1),w?〉
‖w?‖‖w(T+1)‖


T

B


T R

=


T

B R

.

We have thus shown that Equation (9.2) holds, and this concludes our proof.

Free download pdf