Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
8.2 Implementing the ERM Rule 103

overZ, and input, δ∈(0,1), whenAhas access to samples generated i.i.d.
byD,


  • Aterminates after performing at mostcf(, δ) operations

  • The output ofA, denotedhA, can be applied to predict the label of a new
    example while performing at mostcf(, δ) operations

  • The output ofAis probably approximately correct; namely, with proba-
    bility of at least 1−δ(over the random samplesAreceives),LD(hA)≤
    minh′∈HLD(h′) +



  1. Consider a sequence of learning problems, (Zn,Hn,n)∞n=1, where problemn is defined by a domainZn, a hypothesis classHn, and a loss functionn.
    LetAbe a learning algorithm designed for solving learning problems of
    this form. Given a functiong:N×(0,1)^2 →N, we say that the runtime of
    Awith respect to the preceding sequence isO(g), if for alln,Asolves the
    problem (Zn,Hn,`n) in timeO(fn), wherefn: (0,1)^2 →Nis defined by
    fn(,δ) =g(n,,δ).


We say thatAis anefficientalgorithm with respect to a sequence (Zn,Hn,`n)
if its runtime isO(p(n, 1 /, 1 /δ)) for some polynomialp.

From this definition we see that the question whether a general learning prob-
lem can be solved efficiently depends on how it can be broken into a sequence
of specific learning problems. For example, consider the problem of learning a
finite hypothesis class. As we showed in previous chapters, the ERM rule over
His guaranteed to (,δ)-learnHif the number of training examples is order of
mH(,δ) = log(|H|/δ)/^2. Assuming that the evaluation of a hypothesis on an
example takes a constant time, it is possible to implement the ERM rule in time
O(|H|mH(,δ)) by performing an exhaustive search overHwith a training set
of sizemH(,δ). For any fixed finiteH, the exhaustive search algorithm runs
in polynomial time. Furthermore, if we define a sequence of problems in which
|Hn|=n, then the exhaustive search is still considered to be efficient. However, if
we define a sequence of problems for which|Hn|= 2n, then the sample complex-
ity is still polynomial innbut the computational complexity of the exhaustive
search algorithm grows exponentially withn(thus, rendered inefficient).

8.2 Implementing the ERM Rule


Given a hypothesis classH, the ERMHrule is maybe the most natural learning
paradigm. Furthermore, for binary classification problems we saw that if learning
is at all possible, it is possible with the ERM rule. In this section we discuss the
computational complexity of implementing the ERM rule for several hypothesis
classes.
Given a hypothesis class,H, a domain setZ, and a loss function`, the corre-
sponding ERMHrule can be defined as follows:
Free download pdf