Understanding Machine Learning: From Theory to Algorithms

8.2 Implementing the ERM Rule 103

overZ, and input, δ∈(0,1), whenAhas access to samples generated i.i.d. byD,

Aterminates after performing at mostcf(, δ) operations

The output ofA, denotedhA, can be applied to predict the label of a new
example while performing at mostcf(, δ) operations

The output ofAis probably approximately correct; namely, with proba-
bility of at least 1−δ(over the random samplesAreceives),LD(hA)≤
minh′∈HLD(h′) +

Consider a sequence of learning problems, (Zn,Hn,n)∞n=1, where problemn is defined by a domainZn, a hypothesis classHn, and a loss functionn.
LetAbe a learning algorithm designed for solving learning problems of
this form. Given a functiong:N×(0,1)^2 →N, we say that the runtime of
Awith respect to the preceding sequence isO(g), if for alln,Asolves the
problem (Zn,Hn,`n) in timeO(fn), wherefn: (0,1)^2 →Nis defined by
fn(,δ) =g(n,,δ).

We say thatAis anefficientalgorithm with respect to a sequence (Zn,Hn,`n) if its runtime isO(p(n, 1 /, 1 /δ)) for some polynomialp.

From this definition we see that the question whether a general learning problem can be solved efficiently depends on how it can be broken into a sequence of specific learning problems. For example, consider the problem of learning a finite hypothesis class. As we showed in previous chapters, the ERM rule over His guaranteed to (,δ)-learnHif the number of training examples is order of mH(,δ) = log(|H|/δ)/^2. Assuming that the evaluation of a hypothesis on an example takes a constant time, it is possible to implement the ERM rule in time O(|H|mH(,δ)) by performing an exhaustive search overHwith a training set of sizemH(,δ). For any fixed finiteH, the exhaustive search algorithm runs in polynomial time. Furthermore, if we define a sequence of problems in which |Hn|=n, then the exhaustive search is still considered to be efficient. However, if we define a sequence of problems for which|Hn|= 2n, then the sample complexity is still polynomial innbut the computational complexity of the exhaustive search algorithm grows exponentially withn(thus, rendered inefficient).

8.2 Implementing the ERM Rule

Given a hypothesis classH, the ERMHrule is maybe the most natural learning paradigm. Furthermore, for binary classification problems we saw that if learning is at all possible, it is possible with the ERM rule. In this section we discuss the computational complexity of implementing the ERM rule for several hypothesis classes. Given a hypothesis class,H, a domain setZ, and a loss function`, the corre- sponding ERMHrule can be defined as follows:

Understanding Machine Learning: From Theory to Algorithms

8.2 Implementing the ERM Rule

Get our desktop app

Company

Features

Documentation

Resources