Understanding Machine Learning: From Theory to Algorithms

324 Dimensionality Reduction

23.1 Principal Component Analysis (PCA)

Letx 1 ,...,xmbemvectors inRd. We would like to reduce the dimensionality of these vectors using a linear transformation. A matrixW ∈Rn,d, where n < d, induces a mappingx7→Wx, whereWx∈Rnis the lower dimensionality representation ofx. Then, a second matrixU∈Rd,ncan be used to (approxi- mately) recover each original vectorxfrom its compressed version. That is, for a compressed vectory=Wx, whereyis in the low dimensional spaceRn, we can construct ̃x=Uy, so that ̃xis the recovered version ofxand resides in the original high dimensional spaceRd. In PCA, we find the compression matrixW and the recovering matrixUso that the total squared distance between the original and recovered vectors is minimal; namely, we aim at solving the problem

argmin W∈Rn,d,U∈Rd,n

∑m

i=1

‖xi−UWxi‖^22. (23.1)

To solve this problem we first show that the optimal solution takes a specific form. lemma23.1 Let(U,W)be a solution to Equation (23.1). Then the columns of Uare orthonormal (namely,U>Uis the identity matrix ofRn) andW=U>. Proof Fix anyU,Wand consider the mappingx7→UWx. The range of this mapping,R={UWx:x∈Rd}, is anndimensional linear subspace ofRd. Let V∈Rd,nbe a matrix whose columns form an orthonormal basis of this subspace, namely, the range ofVisRandV>V=I. Therefore, each vector inRcan be written asVywherey∈Rn. For everyx∈Rdandy∈Rnwe have ‖x−Vy‖^22 =‖x‖^2 +y>V>Vy− 2 y>V>x = ‖x‖^2 +‖y‖^2 − 2 y>(V>x), where we used the fact thatV>V is the identity matrix ofRn. Minimizing the preceding expression with respect toyby comparing the gradient with respect toyto zero gives thaty=V>x. Therefore, for eachxwe have that V V>x= argmin x ̃∈R

‖x− ̃x‖^22.

In particular this holds forx 1 ,...,xmand therefore we can replaceU,W by V,V>and by that do not increase the objective ∑m

i=1

‖xi−UWxi‖^22 ≥

∑m

i=1

‖xi−V V>xi‖^22.

Since this holds for everyU,Wthe proof of the lemma follows. On the basis of the preceding lemma, we can rewrite the optimization problem given in Equation (23.1) as follows:

argmin U∈Rd,n:U>U=I

∑m

i=1

‖xi−UU>xi‖^22. (23.2)

Understanding Machine Learning: From Theory to Algorithms

23.1 Principal Component Analysis (PCA)

Get our desktop app

Company

Features

Documentation

Resources