Understanding Machine Learning: From Theory to Algorithms

390 Covering Numbers

We can now write

R(A) =

1

mE〈σ,a

∗〉

=

1

mE

[

〈σ,a∗−b(M)〉+

∑M

k=1

〈σ,b(k)−b(k−1)〉

]

≤

1

m

E

[

‖σ‖‖a∗−b(M)‖

]

+

∑M

k=1

1

m

E

[

sup a∈Bˆk

〈σ,a〉

]

.

Since‖σ‖=

√

mand‖a∗−b(M)‖ ≤c 2 −M, the first summand is at most √c m^2

−M. Additionally, by Massart lemma,

1

m

Esup a∈Bˆk

〈σ,a〉≤ 3 c 2 −k

√

2 log(N(c 2 −k,A)^2 ) m

= 6c 2 −k

√

log(N(c 2 −k,A)) m

.

Therefore,

R(A)≤

c 2 −M √ m

+

6 c m

∑M

k=1

2 −k

√

log(N(c 2 −k,A)).

As a corollary we obtain the following: lemma27.5 Assume that there areα,β > 0 such that for anyk≥ 1 we have √ log(N(c 2 −k,A))≤α+βk. Then,

R(A)≤

6 c m(α+ 2β). Proof∑ The bound follows from Lemma 27.4 by takingM→∞and noting that ∞ k=1^2

−k= 1 and∑∞ k=1k^2

−k= 2.

Example 27.2 Consider a setAwhich lies in addimensional subspace ofRm and such thatc= maxa∈A‖a‖. We have shown thatN(r,A)≤

(

2 c√d r

)d

. There-
fore, for anyk,
√
log(N(c 2 −k,A))≤

√

dlog

(

2 k+1

√

d

)

≤

√

dlog(2

√

d) +

√

k d

≤

√

dlog(2

√

d) +

√

dk. Hence Lemma 27.5 yields

R(A) ≤

6 c m

(√

dlog(2

√

d) + 2

√

d

)

= O

(

c

√

dlog(d) m

)

.

Understanding Machine Learning: From Theory to Algorithms

1

∗〉

=

1

[

∑M

]

≤

1

E

[

]

+

∑M

1

E

[

]

.

√

1

√

√

.

R(A)≤

+

∑M

√

(

√

(

√

)

≤

√

√

√

√

√

√

R(A) ≤

(√

√

√

)

= O

(

√

)

.

Get our desktop app

Company

Features

Documentation

Resources