Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

56 Learning via Uniform Convergence


i.i.d. fromDwe have that for allh∈H,|LS(h)−LD(h)|≤. That is,

Dm({S:∀h∈H,|LS(h)−LD(h)|≤})≥ 1 −δ.

Equivalently, we need to show that

Dm({S:∃h∈H,|LS(h)−LD(h)|> })< δ.

Writing

{S:∃h∈H,|LS(h)−LD(h)|> }=∪h∈H{S:|LS(h)−LD(h)|> },

and applying the union bound (Lemma 2.2) we obtain

Dm({S:∃h∈H,|LS(h)−LD(h)|> })≤


h∈H

Dm({S:|LS(h)−LD(h)|> }).

(4.1)
Our second step will be to argue that each summand of the right-hand side
of this inequality is small enough (for a sufficiently largem). That is, we will
show that for any fixed hypothesis,h, (which is chosen in advance prior to the
sampling of the training set), the gap between the true and empirical risks,
|LS(h)−LD(h)|, is likely to be small.
Recall thatLD(h) =Ez∼D[`(h,z)] and thatLS(h) =m^1

∑m
i=1`(h,zi). Since
eachziis sampled i.i.d. from D, the expected value of the random variable
`(h,zi) isLD(h). By the linearity of expectation, it follows thatLD(h) is also
the expected value ofLS(h). Hence, the quantity|LD(h)−LS(h)|is the deviation
of the random variableLS(h) from its expectation. We therefore need to show
that the measure ofLS(h) isconcentratedaround its expected value.
A basic statistical fact, thelaw of large numbers, states that whenmgoes to
infinity, empirical averages converge to their true expectation. This is true for
LS(h), since it is the empirical average ofmi.i.d random variables. However, since
the law of large numbers is only an asymptotic result, it provides no information
about the gap between the empirically estimated error and its true value for any
given, finite, sample size.
Instead, we will use a measure concentration inequality due to Hoeffding, which
quantifies the gap between empirical averages and their expected value.

lemma4.5 (Hoeffding’s Inequality) Letθ 1 ,...,θmbe a sequence of i.i.d. ran-
dom variables and assume that for alli,E[θi] =μandP[a≤θi≤b] = 1. Then,
for any > 0

P

[∣∣

∣∣


1
m

∑m

i=1

θi−μ

∣∣

∣∣


> 

]

≤ 2 exp

(

− 2 m^2 /(b−a)^2

)

.

The proof can be found in Appendix B.
Getting back to our problem, letθibe the random variable`(h,zi). Sinceh
is fixed andz 1 ,...,zmare sampled i.i.d., it follows thatθ 1 ,...,θmare also i.i.d.
random variables. Furthermore,LS(h) = m^1

∑m
i=1θiandLD(h) =μ. Let us
Free download pdf