Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

324 Dimensionality Reduction


23.1 Principal Component Analysis (PCA)


Letx 1 ,...,xmbemvectors inRd. We would like to reduce the dimensional-
ity of these vectors using a linear transformation. A matrixW ∈Rn,d, where
n < d, induces a mappingx7→Wx, whereWx∈Rnis the lower dimensionality
representation ofx. Then, a second matrixU∈Rd,ncan be used to (approxi-
mately) recover each original vectorxfrom its compressed version. That is, for
a compressed vectory=Wx, whereyis in the low dimensional spaceRn, we
can construct ̃x=Uy, so that ̃xis the recovered version ofxand resides in the
original high dimensional spaceRd.
In PCA, we find the compression matrixW and the recovering matrixUso
that the total squared distance between the original and recovered vectors is
minimal; namely, we aim at solving the problem

argmin
W∈Rn,d,U∈Rd,n

∑m

i=1

‖xi−UWxi‖^22. (23.1)

To solve this problem we first show that the optimal solution takes a specific
form.
lemma23.1 Let(U,W)be a solution to Equation (23.1). Then the columns of
Uare orthonormal (namely,U>Uis the identity matrix ofRn) andW=U>.
Proof Fix anyU,Wand consider the mappingx7→UWx. The range of this
mapping,R={UWx:x∈Rd}, is anndimensional linear subspace ofRd. Let
V∈Rd,nbe a matrix whose columns form an orthonormal basis of this subspace,
namely, the range ofVisRandV>V=I. Therefore, each vector inRcan be
written asVywherey∈Rn. For everyx∈Rdandy∈Rnwe have
‖x−Vy‖^22 =‖x‖^2 +y>V>Vy− 2 y>V>x = ‖x‖^2 +‖y‖^2 − 2 y>(V>x),
where we used the fact thatV>V is the identity matrix ofRn. Minimizing the
preceding expression with respect toyby comparing the gradient with respect
toyto zero gives thaty=V>x. Therefore, for eachxwe have that
V V>x= argmin
x ̃∈R

‖x− ̃x‖^22.

In particular this holds forx 1 ,...,xmand therefore we can replaceU,W by
V,V>and by that do not increase the objective
∑m

i=1

‖xi−UWxi‖^22 ≥

∑m

i=1

‖xi−V V>xi‖^22.

Since this holds for everyU,Wthe proof of the lemma follows.
On the basis of the preceding lemma, we can rewrite the optimization problem
given in Equation (23.1) as follows:

argmin
U∈Rd,n:U>U=I

∑m

i=1

‖xi−UU>xi‖^22. (23.2)
Free download pdf