17.8 Exercises 249
Multiclass Batch Perceptron
Input:
A training set (x 1 ,y 1 ),...,(xm,ym)
A class-sensitive feature mapping Ψ :X ×Y →Rd
Initialize: w(1)= (0,...,0)∈Rd
Fort= 1, 2 ,...
If(∃iandy 6 =yi s.t. 〈w(t),Ψ(xi,yi)〉≤〈w(t),Ψ(xi,y)〉) then
w(t+1)=w(t)+ Ψ(xi,yi)−Ψ(xi,y)
else
output w(t)
Prove the following:
theorem17.5 Assume that there existsw?such that for alliand for all
y 6 =yiit holds that〈w?,Ψ(xi,yi)〉≥〈w?,Ψ(xi,y)〉+1. LetR= maxi,y‖Ψ(xi,yi)−
Ψ(xi,y)‖. Then, the multiclass Perceptron algorithm stops after at most(R‖w?‖)^2
iterations, and when it stops it holds that∀i∈[m], yi= argmaxy〈w(t),Ψ(xi,y)〉.
- Generalize the dynamic programming procedure given in Section 17.3 for solv-
ing the maximization problem given in the definition ofˆhin the SGD proce-
dure for multiclass prediction. You can assume that ∆(y′,y) =
∑r
t=1δ(y
t′,yt)
for some arbitrary functionδ.
- Prove that Equation (17.7) holds.
- Show that the two definitions ofπas defined in Equation (17.12) and Equa-
tion (17.13) are indeed equivalent for all the multivariate performance mea-
sures.