Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
17.8 Exercises 249

Multiclass Batch Perceptron

Input:
A training set (x 1 ,y 1 ),...,(xm,ym)
A class-sensitive feature mapping Ψ :X ×Y →Rd
Initialize: w(1)= (0,...,0)∈Rd
Fort= 1, 2 ,...
If(∃iandy 6 =yi s.t. 〈w(t),Ψ(xi,yi)〉≤〈w(t),Ψ(xi,y)〉) then
w(t+1)=w(t)+ Ψ(xi,yi)−Ψ(xi,y)
else
output w(t)

Prove the following:

theorem17.5 Assume that there existsw?such that for alliand for all
y 6 =yiit holds that〈w?,Ψ(xi,yi)〉≥〈w?,Ψ(xi,y)〉+1. LetR= maxi,y‖Ψ(xi,yi)−
Ψ(xi,y)‖. Then, the multiclass Perceptron algorithm stops after at most(R‖w?‖)^2
iterations, and when it stops it holds that∀i∈[m], yi= argmaxy〈w(t),Ψ(xi,y)〉.


  1. Generalize the dynamic programming procedure given in Section 17.3 for solv-
    ing the maximization problem given in the definition ofˆhin the SGD proce-
    dure for multiclass prediction. You can assume that ∆(y′,y) =


∑r
t=1δ(y
t′,yt)
for some arbitrary functionδ.


  1. Prove that Equation (17.7) holds.

  2. Show that the two definitions ofπas defined in Equation (17.12) and Equa-
    tion (17.13) are indeed equivalent for all the multivariate performance mea-
    sures.

Free download pdf