Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
23.6 Bibliographic Remarks 339

23.6 Bibliographic Remarks


PCA is equivalent to best subspace approximation using singular value decom-
position (SVD). The SVD method is described in Appendix C. SVD dates back
to Eugenio Beltrami (1873) and Camille Jordan (1874). It has been rediscovered
many times. In the statistical literature, it was introduced by Pearson (1901). Be-
sides PCA and SVD, there are additional names that refer to the same idea and
are being used in different scientific communities. A few examples are the Eckart-
Young theorem (after Carl Eckart and Gale Young who analyzed the method in
1936), the Schmidt-Mirsky theorem, factor analysis, and the Hotelling transform.
Compressed sensing was introduced in Donoho (2006) and in (Candes & Tao
2005). See also Candes (2006).

23.7 Exercises



  1. In this exercise we show that in the general case, exact recovery of a linear
    compression scheme is impossible.

    1. letA∈Rn,dbe an arbitrary compression matrix wheren≤d−1. Show
      that there existsu,v∈Rn,u 6 =vsuch thatAu=Av.

    2. Conclude that exact recovery of a linear compression scheme is impossible.



  2. Letα∈Rdsuch thatα 1 ≥α 2 ≥···≥αd≥0. Show that


max
β∈[0,1]d:‖β‖ 1 ≤n

∑d

j=1

αjβj =

∑n

j=1

αj.

Hint:Take every vectorβ∈[0,1]dsuch that‖β‖ 1 ≤n. Letibe the minimal
index for whichβi<1. Ifi=n+ 1 we are done. Otherwise, show that we can
increaseβi, while possibly decreasingβjfor somej > i, and obtain a better
solution. This will imply that the optimal solution is to setβi= 1 fori≤n
andβi= 0 fori > n.
3.Kernel PCA: In this exercise we show how PCA can be used for construct-
ing nonlinear dimensionality reduction on the basis of the kernel trick (see
Chapter 16).
LetXbe some instance space and letS={x 1 ,...,xm}be a set of points
inX. Consider a feature mappingψ:X →V, whereV is some Hilbert space
(possibly of infinite dimension). LetK:X ×Xbe a kernel function, that is,
k(x,x′) =〈ψ(x),ψ(x′)〉. Kernel PCA is the process of mapping the elements
inSintoV usingψ, and then applying PCA over{ψ(x 1 ),...,ψ(xm)}into
Rn. The output of this process is the set of reduced elements.
Show how this process can be done in polynomial time in terms ofm
andn, assuming that each evaluation ofK(·,·) can be calculated in a con-
stant time. In particular, if your implementation requires multiplication of
two matricesAandB, verify that their product can be computed. Similarly,
Free download pdf