Understanding Machine Learning: From Theory to Algorithms

17.8 Exercises 249

Multiclass Batch Perceptron

Input: A training set (x 1 ,y 1 ),...,(xm,ym) A class-sensitive feature mapping Ψ :X ×Y →Rd Initialize: w(1)= (0,...,0)∈Rd Fort= 1, 2 ,... If(∃iandy 6 =yi s.t. 〈w(t),Ψ(xi,yi)〉≤〈w(t),Ψ(xi,y)〉) then w(t+1)=w(t)+ Ψ(xi,yi)−Ψ(xi,y) else output w(t)

Prove the following:

theorem17.5 Assume that there existsw?such that for alliand for all y 6 =yiit holds that〈w?,Ψ(xi,yi)〉≥〈w?,Ψ(xi,y)〉+1. LetR= maxi,y‖Ψ(xi,yi)− Ψ(xi,y)‖. Then, the multiclass Perceptron algorithm stops after at most(R‖w?‖)^2 iterations, and when it stops it holds that∀i∈[m], yi= argmaxy〈w(t),Ψ(xi,y)〉.

Generalize the dynamic programming procedure given in Section 17.3 for solv-
ing the maximization problem given in the definition ofˆhin the SGD proce-
dure for multiclass prediction. You can assume that ∆(y′,y) =

∑r t=1δ(y t′,yt) for some arbitrary functionδ.

Prove that Equation (17.7) holds.

Show that the two definitions ofπas defined in Equation (17.12) and Equa-
tion (17.13) are indeed equivalent for all the multivariate performance mea-
sures.

Understanding Machine Learning: From Theory to Algorithms

Get our desktop app

Company

Features

Documentation

Resources