Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

390 Covering Numbers


We can now write

R(A) =

1

mE〈σ,a

∗〉

=

1

mE

[

〈σ,a∗−b(M)〉+

∑M

k=1

〈σ,b(k)−b(k−1)〉

]


1

m

E

[

‖σ‖‖a∗−b(M)‖

]

+

∑M

k=1

1

m

E

[

sup
a∈Bˆk

〈σ,a〉

]

.

Since‖σ‖=


mand‖a∗−b(M)‖ ≤c 2 −M, the first summand is at most
√c
m^2

−M. Additionally, by Massart lemma,

1

m

Esup
a∈Bˆk

〈σ,a〉≤ 3 c 2 −k


2 log(N(c 2 −k,A)^2 )
m

= 6c 2 −k


log(N(c 2 −k,A))
m

.

Therefore,

R(A)≤

c 2 −M

m

+

6 c
m

∑M

k=1

2 −k


log(N(c 2 −k,A)).

As a corollary we obtain the following:
lemma27.5 Assume that there areα,β > 0 such that for anyk≥ 1 we have

log(N(c 2 −k,A))≤α+βk.
Then,

R(A)≤

6 c
m(α+ 2β).
Proof∑ The bound follows from Lemma 27.4 by takingM→∞and noting that

k=1^2

−k= 1 and∑∞
k=1k^2

−k= 2.

Example 27.2 Consider a setAwhich lies in addimensional subspace ofRm
and such thatc= maxa∈A‖a‖. We have shown thatN(r,A)≤

(

2 c√d
r

)d

. There-
fore, for anyk,

log(N(c 2 −k,A))≤



dlog

(

2 k+1


d

)



dlog(2


d) +


k d



dlog(2


d) +


dk.
Hence Lemma 27.5 yields

R(A) ≤

6 c
m

(√

dlog(2


d) + 2


d

)

= O

(

c


dlog(d)
m

)

.
Free download pdf