Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

332 Dimensionality Reduction


Proof We assume, by way of contradiction, thatx ̃ 6 =x. Sincexsatisfies the
constraints in the optimization problem that definesx ̃we clearly have that
‖ ̃x‖ 0 ≤ ‖x‖ 0 ≤s. Therefore,‖x− ̃x‖ 0 ≤ 2 sand we can apply the RIP in-
equality on the vectorx− ̃x. But, sinceW(x− ̃x) = 0 we get that| 0 − 1 | ≤,
which leads to a contradiction.

The reconstruction scheme given in Theorem 23.6 seems to be nonefficient
because we need to minimize a combinatorial objective (the sparsity ofv). Quite
surprisingly, it turns out that we can replace the combinatorial objective,‖v‖ 0 ,
with a convex objective,‖v‖ 1 , which leads to a linear programming problem that
can be solved efficiently. This is stated formally in the following theorem.

theorem23.7 Assume that the conditions of Theorem 23.6 holds and that
 <1+^1 √ 2. Then,
x = argmin
v:Wv=y

‖v‖ 0 = argmin
v:Wv=y

‖v‖ 1.

In fact, we will prove a stronger result, which holds even ifxis not a sparse
vector.

theorem23.8 Let <1+^1 √ 2 and letW be a(, 2 s)-RIP matrix. Letxbe an
arbitrary vector and denote

xs∈argmin
v:‖v‖ 0 ≤s

‖x−v‖ 1.

That is,xsis the vector which equalsxon theslargest elements ofxand equals
0 elsewhere. Lety=Wxbe the compression ofxand let

x?∈argmin
v:Wv=y

‖v‖ 1

be the reconstructed vector. Then,

‖x?−x‖ 2 ≤ 2

1 +ρ
1 −ρs

− 1 / (^2) ‖x−xs‖ 1 ,
whereρ=



2 /(1−).

Note that in the special case thatx=xswe get an exact recovery,x?=x, so
Theorem 23.7 is a special case of Theorem 23.8. The proof of Theorem 23.8 is
given in Section 23.3.1.
Finally, the third result tells us that random matrices withn≥Ω(slog(d)) are
likely to be RIP. In fact, the theorem shows that multiplying a random matrix
by an orthonormal matrix also provides an RIP matrix. This is important for
compressing signals of the formx=Uαwherexis not sparse butαis sparse.
In that case, ifWis a random matrix and we compress usingy=Wxthen this
is the same as compressingαbyy= (WU)αand sinceWUis also RIP we can
reconstructα(and thus alsox) fromy.
Free download pdf