Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
2.3 Empirical Risk Minimization with Inductive Bias 39

commonly denoted by. We interpret the eventL(D,f)(hS)> as a failure of the
learner, while ifL(D,f)(hS)≤we view the output of the algorithm as an approx-
imately correct predictor. Therefore (fixing some labeling functionf:X → Y),
we are interested in upper bounding the probability to samplem-tuple of in-
stances that will lead to failure of the learner. Formally, letS|x= (x 1 ,...,xm)
be the instances of the training set. We would like to upper bound


Dm({S|x:L(D,f)(hS)> }).

LetHBbe the set of “bad” hypotheses, that is,

HB={h∈H:L(D,f)(h)> }.

In addition, let


M={S|x:∃h∈HB,LS(h) = 0}

be the set of misleading samples: Namely, for everyS|x∈M, there is a “bad”
hypothesis,h∈HB, that looks like a “good” hypothesis onS|x. Now, recall that
we would like to bound the probability of the eventL(D,f)(hS)> . But, since
the realizability assumption implies thatLS(hS) = 0, it follows that the event
L(D,f)(hS)> can only happen if for someh∈ HBwe haveLS(h) = 0. In
other words, this event will only happen if our sample is in the set of misleading
samples,M. Formally, we have shown that


{S|x:L(D,f)(hS)> }⊆M.

Note that we can rewriteMas


M=


h∈HB

{S|x:LS(h) = 0}. (2.5)

Hence,


Dm({S|x:L(D,f)(hS)> }) ≤ Dm(M) =Dm(∪h∈HB{S|x:LS(h) = 0}).
(2.6)
Next, we upper bound the right-hand side of the preceding equation using the
union bound– a basic property of probabilities.


lemma2.2 (Union Bound) For any two setsA,B and a distributionDwe
have


D(A∪B)≤D(A) +D(B).

Applying the union bound to the right-hand side of Equation (2.6) yields

Dm({S|x:L(D,f)(hS)> }) ≤


h∈HB

Dm({S|x:LS(h) = 0}). (2.7)

Next, let us bound each summand of the right-hand side of the preceding in-
equality. Fix some “bad” hypothesish∈HB. The eventLS(h) = 0 is equivalent

Free download pdf