Understanding Machine Learning: From Theory to Algorithms

228 Multiclass, Ranking, and Complex Prediction Problems

sifiers, each of which discriminates between one class and the rest of the classes. That is, given a training setS= (x 1 ,y 1 ),...,(xm,ym), where everyyiis inY, we constructkbinary training sets,S 1 ,...,Sk, whereSi= (x 1 ,(−1)^1 [y^16 =i]),...,(xm,(−1)^1 [ym^6 =i]). In words,Siis the set of instances labeled 1 if their label inSwasi, and− 1 otherwise. For everyi∈[k] we train a binary predictorhi:X →{± 1 }based on Si, hoping thathi(x) should equal 1 if and only ifxbelongs to classi. Then, givenh 1 ,...,hk, we construct a multiclass predictor using the rule

h(x)∈argmax i∈[k]

hi(x). (17.1)

When more than one binary hypothesis predicts “1” we should somehow decide which class to predict (e.g., we can arbitrarily decide to break ties by taking the minimal index in argmaxihi(x)). A better approach can be applied whenever eachhihides additional information, which can be interpreted as the confidence in the predictiony=i. For example, this is the case in halfspaces, where the actual prediction is sign(〈w,x〉), but we can interpret〈w,x〉as the confidence in the prediction. In such cases, we can apply the multiclass rule given in Equa- tion (17.1) on the real valued predictions. A pseudocode of the One-versus-All approach is given in the following.

One-versus-All input: training setS= (x 1 , y 1 ),...,(xm, ym) algorithm for binary classificationA foreachi∈Y letSi= (x 1 ,(−1)^1 [y^16 =i]),...,(xm,(−1)^1 [ym^6 =i]) lethi=A(Si) output: the multiclass hypothesis defined byh(x)∈argmaxi∈Yhi(x)

Another popular reduction is theAll-Pairs approach, in which all pairs of classes are compared to each other. Formally, given a training setS= (x 1 ,y 1 ),...,(xm,ym), where everyyiis in [k], for every 1≤i < j≤kwe construct a binary training sequence,Si,j, containing all examples fromSwhose label is eitheriorj. For each such an example, we set the binary label inSi,jto be +1 if the multiclass label inSisiand−1 if the multiclass label inSisj. Next, we train a binary classification algorithm based on everySi,j to gethi,j. Finally, we construct a multiclass classifier by predicting the class that had the highest number of “wins.” A pseudocode of the All-Pairs approach is given in the following.

Understanding Machine Learning: From Theory to Algorithms

Get our desktop app

Company

Features

Documentation

Resources