Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
17.6 Summary 247

Solving Equation (17.14)

input:
(x 1 ,...,xr),(y 1 ,...,yr),w,V,∆
assumptions:
∆ is a function ofa, b, c, d
Vcontains all vectors for whichf(a,b) = 1 for some functionf
initialize:
P=|{i:yi= 1}|,N=|{i:yi=− 1 }|
μ= (〈w,x 1 〉,...,〈w,xr〉),α?=−∞
sort examples so thatμ 1 ≥μ 2 ≥···≥μr
leti 1 ,...,iPbe the (sorted) indices of the positive examples
letj 1 ,...,jNbe the (sorted) indices of the negative examples
fora= 0, 1 ,...,P
c=P−a
forb= 0, 1 ,...,Nsuch thatf(a,b) = 1
d=N−b
calculate ∆ usinga, b, c, d
setv 1 ,...,vrs.t.vi 1 =···=via=vj 1 =···=vjb= 1
and the rest of the elements ofvequal− 1
setα= ∆ +

∑r
i=1viμi
ifα≥α?
α?=α,v?=v
output v?

17.6 Summary


Many real world supervised learning problems can be cast as learning a multiclass
predictor. We started the chapter by introducing reductions of multiclass learning
to binary learning. We then described and analyzed the family of linear predictors
for multiclass learning. We have shown how this family can be used even if the
number of classes is extremely large, as long as we have an adequate structure
on the problem. Finally, we have described ranking problems. In Chapter 29 we
study the sample complexity of multiclass learning in more detail.

17.7 Bibliographic Remarks


The One-versus-All and All-Pairs approach reductions have been unified un-
der the framework of Error Correction Output Codes (ECOC) (Dietterich &
Bakiri 1995, Allwein, Schapire & Singer 2000). There are also other types of re-
ductions such as tree-based classifiers (see, for example, Beygelzimer, Langford
& Ravikumar (2007)). The limitations of reduction techniques have been studied
Free download pdf