Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

228 Multiclass, Ranking, and Complex Prediction Problems


sifiers, each of which discriminates between one class and the rest of the classes.
That is, given a training setS= (x 1 ,y 1 ),...,(xm,ym), where everyyiis inY, we
constructkbinary training sets,S 1 ,...,Sk, whereSi= (x 1 ,(−1)^1 [y^16 =i]),...,(xm,(−1)^1 [ym^6 =i]).
In words,Siis the set of instances labeled 1 if their label inSwasi, and− 1
otherwise. For everyi∈[k] we train a binary predictorhi:X →{± 1 }based on
Si, hoping thathi(x) should equal 1 if and only ifxbelongs to classi. Then,
givenh 1 ,...,hk, we construct a multiclass predictor using the rule

h(x)∈argmax
i∈[k]

hi(x). (17.1)

When more than one binary hypothesis predicts “1” we should somehow decide
which class to predict (e.g., we can arbitrarily decide to break ties by taking the
minimal index in argmaxihi(x)). A better approach can be applied whenever
eachhihides additional information, which can be interpreted as the confidence
in the predictiony=i. For example, this is the case in halfspaces, where the
actual prediction is sign(〈w,x〉), but we can interpret〈w,x〉as the confidence
in the prediction. In such cases, we can apply the multiclass rule given in Equa-
tion (17.1) on the real valued predictions. A pseudocode of the One-versus-All
approach is given in the following.

One-versus-All
input:
training setS= (x 1 , y 1 ),...,(xm, ym)
algorithm for binary classificationA
foreachi∈Y
letSi= (x 1 ,(−1)^1 [y^16 =i]),...,(xm,(−1)^1 [ym^6 =i])
lethi=A(Si)
output:
the multiclass hypothesis defined byh(x)∈argmaxi∈Yhi(x)

Another popular reduction is theAll-Pairs approach, in which all pairs of
classes are compared to each other. Formally, given a training setS= (x 1 ,y 1 ),...,(xm,ym),
where everyyiis in [k], for every 1≤i < j≤kwe construct a binary training
sequence,Si,j, containing all examples fromSwhose label is eitheriorj. For
each such an example, we set the binary label inSi,jto be +1 if the multiclass
label inSisiand−1 if the multiclass label inSisj. Next, we train a binary
classification algorithm based on everySi,j to gethi,j. Finally, we construct
a multiclass classifier by predicting the class that had the highest number of
“wins.” A pseudocode of the All-Pairs approach is given in the following.
Free download pdf