Understanding Machine Learning: From Theory to Algorithms

332 Dimensionality Reduction

Proof We assume, by way of contradiction, thatx ̃ 6 =x. Sincexsatisfies the constraints in the optimization problem that definesx ̃we clearly have that ‖ ̃x‖ 0 ≤ ‖x‖ 0 ≤s. Therefore,‖x− ̃x‖ 0 ≤ 2 sand we can apply the RIP in- equality on the vectorx− ̃x. But, sinceW(x− ̃x) = 0 we get that| 0 − 1 | ≤, which leads to a contradiction.

The reconstruction scheme given in Theorem 23.6 seems to be nonefficient because we need to minimize a combinatorial objective (the sparsity ofv). Quite surprisingly, it turns out that we can replace the combinatorial objective,‖v‖ 0 , with a convex objective,‖v‖ 1 , which leads to a linear programming problem that can be solved efficiently. This is stated formally in the following theorem.

theorem23.7 Assume that the conditions of Theorem 23.6 holds and that <1+^1 √ 2. Then, x = argmin v:Wv=y

‖v‖ 0 = argmin v:Wv=y

‖v‖ 1.

In fact, we will prove a stronger result, which holds even ifxis not a sparse vector.

theorem23.8 Let <1+^1 √ 2 and letW be a(, 2 s)-RIP matrix. Letxbe an arbitrary vector and denote

xs∈argmin v:‖v‖ 0 ≤s

‖x−v‖ 1.

That is,xsis the vector which equalsxon theslargest elements ofxand equals 0 elsewhere. Lety=Wxbe the compression ofxand let

x?∈argmin v:Wv=y

‖v‖ 1

be the reconstructed vector. Then,

‖x?−x‖ 2 ≤ 2

1 +ρ 1 −ρs

− 1 / (^2) ‖x−xs‖ 1 ,
whereρ=

√

2 /(1−).

Note that in the special case thatx=xswe get an exact recovery,x?=x, so Theorem 23.7 is a special case of Theorem 23.8. The proof of Theorem 23.8 is given in Section 23.3.1. Finally, the third result tells us that random matrices withn≥Ω(slog(d)) are likely to be RIP. In fact, the theorem shows that multiplying a random matrix by an orthonormal matrix also provides an RIP matrix. This is important for compressing signals of the formx=Uαwherexis not sparse butαis sparse. In that case, ifWis a random matrix and we compress usingy=Wxthen this is the same as compressingαbyy= (WU)αand sinceWUis also RIP we can reconstructα(and thus alsox) fromy.

Understanding Machine Learning: From Theory to Algorithms

√

2 /(1−).

Get our desktop app

Company

Features

Documentation

Resources

Understanding Machine Learning: From Theory to Algorithms

√

2 /(1−).

Get our desktop app

Company

Features

Documentation

Resources

2 /(1−).