Understanding Machine Learning: From Theory to Algorithms

23.3 Compressed Sensing 333

theorem23.9 LetUbe an arbitrary fixedd×dorthonormal matrix, let,δ be scalars in(0,1), letsbe an integer in[d], and letnbe an integer that satisfies

n≥ 100 slog(40d/(δ )) ^2 LetW∈Rn,dbe a matrix s.t. each element ofW is distributed normally with zero mean and variance of 1 /n. Then, with proabability of at least 1 −δover the choice ofW, the matrixWUis(,s)-RIP.

23.3.1 Proofs*

Proof of Theorem 23.8

We follow a proof due to Cand`es (2008). Leth=x?−x. Given a vectorvand a set of indicesIwe denote byvIthe vector whoseith element isviifi∈Iand 0 otherwise. The first trick we use is to partition the set of indices [d] ={ 1 ,...,d}into disjoint sets of sizes. That is, we will write [d] =T 0 ∪·T 1 ∪·T 2 ...Td/s− 1 where for alli,|Ti|=s, and we assume for simplicity thatd/sis an integer. We define the partition as follows. InT 0 we put thesindices corresponding to theslargest elements in absolute values ofx(ties are broken arbitrarily). LetT 0 c= [d]\T 0. Next,T 1 will be thesindices corresponding to theslargest elements in absolute value ofhT 0 c. LetT 0 , 1 =T 0 ∪T 1 andT 0 c, 1 = [d]\T 0 , 1. Next,T 2 will correspond to theslargest elements in absolute value ofhT 0 c, 1. And, we will constructT 3 ,T 4 ,... in the same way. To prove the theorem we first need the following lemma, which shows that RIP also implies approximate orthogonality.

lemma23.10 LetWbe an(, 2 s)-RIP matrix. Then, for any two disjoint sets I,J, both of size at mosts, and for any vectoruwe have that〈WuI,WuJ〉 ≤ ‖uI‖ 2 ‖uJ‖ 2.

Proof W.l.o.g. assume‖uI‖ 2 =‖uJ‖ 2 = 1.

〈WuI,WuJ〉=

‖WuI+WuJ‖^22 −‖WuI−WuJ‖^22 4

.

But, since|J∪I| ≤ 2 swe get from the RIP condition that‖WuI+WuJ‖^22 ≤ (1 +)(‖uI‖^22 +‖uJ‖^22 ) = 2(1 +) and that−‖WuI−WuJ‖^22 ≤−(1−)(‖uI‖^22 + ‖uJ‖^22 ) =−2(1−), which concludes our proof.

We are now ready to prove the theorem. Clearly,

‖h‖ 2 =‖hT 0 , 1 +hT 0 c, 1 ‖ 2 ≤‖hT 0 , 1 ‖ 2 +‖hT 0 c, 1 ‖ 2. (23.5)

To prove the theorem we will show the following two claims:

Claim 1:.‖hT 0 c, 1 ‖ 2 ≤‖hT 0 ‖ 2 + 2s−^1 /^2 ‖x−xs‖ 1.

Claim 2:.‖hT 0 , 1 ‖ 2 ≤ 12 −ρρs−^1 /^2 ‖x−xs‖ 1.

Understanding Machine Learning: From Theory to Algorithms

23.3.1 Proofs*

Proof of Theorem 23.8

.

Get our desktop app

Company

Features

Documentation

Resources