2.3 Empirical Risk Minimization with Inductive Bias 39
commonly denoted by. We interpret the eventL(D,f)(hS)> as a failure of the
learner, while ifL(D,f)(hS)≤we view the output of the algorithm as an approx-
imately correct predictor. Therefore (fixing some labeling functionf:X → Y),
we are interested in upper bounding the probability to samplem-tuple of in-
stances that will lead to failure of the learner. Formally, letS|x= (x 1 ,...,xm)
be the instances of the training set. We would like to upper bound
Dm({S|x:L(D,f)(hS)> }).
LetHBbe the set of “bad” hypotheses, that is,
HB={h∈H:L(D,f)(h)> }.
In addition, let
M={S|x:∃h∈HB,LS(h) = 0}
be the set of misleading samples: Namely, for everyS|x∈M, there is a “bad”
hypothesis,h∈HB, that looks like a “good” hypothesis onS|x. Now, recall that
we would like to bound the probability of the eventL(D,f)(hS)> . But, since
the realizability assumption implies thatLS(hS) = 0, it follows that the event
L(D,f)(hS)> can only happen if for someh∈ HBwe haveLS(h) = 0. In
other words, this event will only happen if our sample is in the set of misleading
samples,M. Formally, we have shown that
{S|x:L(D,f)(hS)> }⊆M.
Note that we can rewriteMas
M=
⋃
h∈HB
{S|x:LS(h) = 0}. (2.5)
Hence,
Dm({S|x:L(D,f)(hS)> }) ≤ Dm(M) =Dm(∪h∈HB{S|x:LS(h) = 0}).
(2.6)
Next, we upper bound the right-hand side of the preceding equation using the
union bound– a basic property of probabilities.
lemma2.2 (Union Bound) For any two setsA,B and a distributionDwe
have
D(A∪B)≤D(A) +D(B).
Applying the union bound to the right-hand side of Equation (2.6) yields
Dm({S|x:L(D,f)(hS)> }) ≤
∑
h∈HB
Dm({S|x:LS(h) = 0}). (2.7)
Next, let us bound each summand of the right-hand side of the preceding in-
equality. Fix some “bad” hypothesish∈HB. The eventLS(h) = 0 is equivalent