Understanding Machine Learning: From Theory to Algorithms

8.2 Implementing the ERM Rule 105

8.2.2 Axis Aligned Rectangles

LetHnbe the class of axis aligned rectangles inRn, namely,

Hn={h(a 1 ,...,an,b 1 ,...,bn):∀i,ai≤bi} where

h(a 1 ,...,an,b 1 ,...,bn)(x,y) =

{

1 if∀i, xi∈[ai,bi] 0 otherwise

(8.1)

Efficiently Learnable in the Realizable Case

Consider implementing the ERM rule in the realizable case. That is, we are given a training setS= (x 1 ,y 1 ),...,(xm,ym) of examples, such that there exists an axis aligned rectangle,h∈Hn, for whichh(xi) =yifor alli. Our goal is to find such an axis aligned rectangle with a zero training error, namely, a rectangle that is consistent with all the labels inS. We show later that this can be done in timeO(nm). Indeed, for eachi∈[n], setai= min{xi: (x,1)∈S}andbi= max{xi: (x,1)∈S}. In words, we take aito be the minimal value of thei’th coordinate of a positive example inSand bito be the maximal value of thei’th coordinate of a positive example inS. It is easy to verify that the resulting rectangle has zero training error and that the runtime of finding eachaiandbiisO(m). Hence, the total runtime of this procedure isO(nm).

Not Efficiently Learnable in the Agnostic Case

In the agnostic case, we do not assume that some hypothesishperfectly predicts the labels of all the examples in the training set. Our goal is therefore to find hthat minimizes the number of examples for whichyi 6 =h(xi). It turns out that for many common hypothesis classes, including the classes of axis aligned rectangles we consider here, solving the ERM problem in the agnostic setting is NP-hard (and, in most cases, it is even NP-hard to find someh∈Hwhose error is no more than some constantc >1 times that of the empirical risk minimizer inH). That is, unless P = NP, there is no algorithm whose running time is polynomial inmandnthat is guaranteed to find an ERM hypothesis for these problems (Ben-David, Eiron & Long 2003). On the other hand, it is worthwhile noticing that, if we fix one specific hypothesis class, say, axis aligned rectangles in some fixed dimension,n, then there exist efficient learning algorithms for this class. In other words, there are successful agnostic PAC learners that run in time polynomial in 1/and 1/δ(but their dependence on the dimensionnis not polynomial). To see this, recall the implementation of the ERM rule we presented for the realizable case, from which it follows that an axis aligned rectangle is determined by at most 2nexamples. Therefore, given a training set of sizem, we can per- form an exhaustive search over all subsets of the training set of size at most 2n examples and construct a rectangle from each such subset. Then, we can pick

Understanding Machine Learning: From Theory to Algorithms

8.2.2 Axis Aligned Rectangles

{

(8.1)

Efficiently Learnable in the Realizable Case

Not Efficiently Learnable in the Agnostic Case

Get our desktop app

Company

Features

Documentation

Resources