Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

17 Multiclass, Ranking, and Complex Prediction Problems


Multiclass categorization is the problem of classifying instances into one of several
possible target classes. That is, we are aiming at learning a predictorh:X →Y,
whereYis a finite set of categories. Applications include, for example, catego-
rizing documents according to topic (Xis the set of documents andYis the set
of possible topics) or determining which object appears in a given image (Xis
the set of images andYis the set of possible objects).
The centrality of the multiclass learning problem has spurred the development
of various approaches for tackling the task. Perhaps the most straightforward
approach is a reduction from multiclass classification to binary classification. In
Section 17.1 we discuss the most common two reductions as well as the main
drawback of the reduction approach.
We then turn to describe a family of linear predictors for multiclass problems.
Relying on the RLM and SGD frameworks from previous chapters, we describe
several practical algorithms for multiclass prediction.
In Section 17.3 we show how to use the multiclass machinery for complex pre-
diction problems in whichYcan be extremely large but has some structure on
it. This task is often calledstructured output learning. In particular, we demon-
strate this approach for the task of recognizing handwritten words, in whichY
is the set of all possible strings of some bounded length (hence, the size ofYis
exponential in the maximal length of a word).
Finally, in Section 17.4 and Section 17.5 we discuss ranking problems in which
the learner should order a set of instances according to their “relevance.” A typ-
ical application is ordering results of a search engine according to their relevance
to the query. We describe several performance measures that are adequate for
assessing the performance of ranking predictors and describe how to learn linear
predictors for ranking problems efficiently.

17.1 One-versus-All and All-Pairs


The simplest approach to tackle multiclass prediction problems is by reduction
to binary classification. Recall that in multiclass prediction we would like to learn
a functionh:X →Y. Without loss of generality let us denoteY={ 1 ,...,k}.
In the One-versus-All method (a.k.a. One-versus-Rest) we trainkbinary clas-

Understanding Machine Learning,©c2014 by Shai Shalev-Shwartz and Shai Ben-David
Published 2014 by Cambridge University Press.
Personal use only. Not for distribution. Do not post.
Please link tohttp://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning
Free download pdf