Understanding Machine Learning: From Theory to Algorithms

17.6 Summary 247

Solving Equation (17.14)

input: (x 1 ,...,xr),(y 1 ,...,yr),w,V,∆ assumptions: ∆ is a function ofa, b, c, d Vcontains all vectors for whichf(a,b) = 1 for some functionf initialize: P=|{i:yi= 1}|,N=|{i:yi=− 1 }| μ= (〈w,x 1 〉,...,〈w,xr〉),α?=−∞ sort examples so thatμ 1 ≥μ 2 ≥···≥μr leti 1 ,...,iPbe the (sorted) indices of the positive examples letj 1 ,...,jNbe the (sorted) indices of the negative examples fora= 0, 1 ,...,P c=P−a forb= 0, 1 ,...,Nsuch thatf(a,b) = 1 d=N−b calculate ∆ usinga, b, c, d setv 1 ,...,vrs.t.vi 1 =···=via=vj 1 =···=vjb= 1 and the rest of the elements ofvequal− 1 setα= ∆ +

∑r i=1viμi ifα≥α? α?=α,v?=v output v?

17.6 Summary

Many real world supervised learning problems can be cast as learning a multiclass predictor. We started the chapter by introducing reductions of multiclass learning to binary learning. We then described and analyzed the family of linear predictors for multiclass learning. We have shown how this family can be used even if the number of classes is extremely large, as long as we have an adequate structure on the problem. Finally, we have described ranking problems. In Chapter 29 we study the sample complexity of multiclass learning in more detail.

17.7 Bibliographic Remarks

The One-versus-All and All-Pairs approach reductions have been unified un- der the framework of Error Correction Output Codes (ECOC) (Dietterich & Bakiri 1995, Allwein, Schapire & Singer 2000). There are also other types of reductions such as tree-based classifiers (see, for example, Beygelzimer, Langford & Ravikumar (2007)). The limitations of reduction techniques have been studied

Understanding Machine Learning: From Theory to Algorithms

17.6 Summary

17.7 Bibliographic Remarks

Get our desktop app

Company

Features

Documentation

Resources