Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

336 Dimensionality Reduction


where in the last inequality we used Stirling’s approximation. Overall we obtained
that

|Q|≤(2k+ 1)d(π/e)d/^2 (d/2)−d/^22 −d. (23.10)

Now lets specifyk. For eachx∈B 2 (1) letv∈Qbe the vector whoseith element
is sign(xi)b|xi|kc/k. Then, for each element we have that|xi−vi| ≤ 1 /kand
thus

‖x−v‖≤


d
k

.

To ensure that the right-hand side will be at mostwe shall setk=d


d/e.
Plugging this value into Equation (23.10) we conclude that

|Q|≤(3


d/(2))d(π/e)d/^2 (d/2)−d/^2 =

(

3



π
2 e

)d

( 3



)d
.

Letxbe a vector that can be written asx=UαwithUbeing some orthonor-
mal matrix and‖α‖ 0 ≤s. Combining the earlier covering property and the JL
lemma (Lemma 23.4) enables us to show that a randomWwill not distort any
suchx.

lemma23.12 LetUbe an orthonormald×dmatrix and letI⊂[d]be a set
of indices of size|I|=s. LetSbe the span of{Ui:i∈I}, whereUiis theith
column ofU. Letδ∈(0,1),∈(0,1), andn∈Nsuch that

n≥ 24 log(2/δ) +slog(12/)
^2

.

Then, with probability of at least 1 −δover a choice of a random matrixW∈Rn,d
such that each element ofWis independently distributed according toN(0, 1 /n),
we have

sup
x∈S

∣∣

∣∣‖Wx‖
‖x‖

− 1

∣∣

∣∣< .

Proof It suffices to prove the lemma for allx∈Swith‖x‖= 1. We can write
x=UIαwhereα∈Rs,‖α‖ 2 = 1, andUIis the matrix whose columns are
{Ui :i ∈I}. Using Lemma 23.11 we know that there exists a setQof size
|Q|≤(12/)ssuch that

sup
α:‖α‖=1

minv∈Q‖α−v‖≤(/4).

But sinceUis orthogonal we also have that

sup
α:‖α‖=1

min
v∈Q

‖UIα−UIv‖≤(/4).

Applying Lemma 23.4 on the set{UIv:v∈Q}we obtain that fornsatisfying
Free download pdf