Understanding Machine Learning: From Theory to Algorithms

336 Dimensionality Reduction

where in the last inequality we used Stirling’s approximation. Overall we obtained that

|Q|≤(2k+ 1)d(π/e)d/^2 (d/2)−d/^22 −d. (23.10)

Now lets specifyk. For eachx∈B 2 (1) letv∈Qbe the vector whoseith element is sign(xi)b|xi|kc/k. Then, for each element we have that|xi−vi| ≤ 1 /kand thus

‖x−v‖≤

√

d k

.

To ensure that the right-hand side will be at mostwe shall setk=d

√

d/e. Plugging this value into Equation (23.10) we conclude that

|Q|≤(3

√

d/(2))d(π/e)d/^2 (d/2)−d/^2 =

(

3

√

π 2 e

)d ≤

( 3

)d .

Letxbe a vector that can be written asx=UαwithUbeing some orthonor- mal matrix and‖α‖ 0 ≤s. Combining the earlier covering property and the JL lemma (Lemma 23.4) enables us to show that a randomWwill not distort any suchx.

lemma23.12 LetUbe an orthonormald×dmatrix and letI⊂[d]be a set of indices of size|I|=s. LetSbe the span of{Ui:i∈I}, whereUiis theith column ofU. Letδ∈(0,1),∈(0,1), andn∈Nsuch that

n≥ 24 log(2/δ) +slog(12/) ^2

.

Then, with probability of at least 1 −δover a choice of a random matrixW∈Rn,d such that each element ofWis independently distributed according toN(0, 1 /n), we have

sup x∈S

∣∣

∣∣‖Wx‖ ‖x‖

− 1

∣∣

∣∣< .

Proof It suffices to prove the lemma for allx∈Swith‖x‖= 1. We can write x=UIαwhereα∈Rs,‖α‖ 2 = 1, andUIis the matrix whose columns are {Ui :i ∈I}. Using Lemma 23.11 we know that there exists a setQof size |Q|≤(12/)ssuch that

sup α:‖α‖=1

minv∈Q‖α−v‖≤(/4).

But sinceUis orthogonal we also have that

sup α:‖α‖=1

min v∈Q

‖UIα−UIv‖≤(/4).

Applying Lemma 23.4 on the set{UIv:v∈Q}we obtain that fornsatisfying

Understanding Machine Learning: From Theory to Algorithms

√

.

√

√

(

√

( 3

.

∣∣

− 1

∣∣

∣∣< .

Get our desktop app

Company

Features

Documentation

Resources

Understanding Machine Learning: From Theory to Algorithms

√

.

√

√

(

√

( 3

.

∣∣

− 1

∣∣

∣∣< .

Get our desktop app

Company

Features

Documentation

Resources

∣∣< .