23.1 Principal Component Analysis (PCA) 325
We further simplify the optimization problem by using the following elementary
algebraic manipulations. For everyx∈Rdand a matrixU∈Rd,nsuch that
U>U=Iwe have
‖x−UU>x‖^2 =‖x‖^2 − 2 x>UU>x+x>UU>UU>x
=‖x‖^2 −x>UU>x
=‖x‖^2 −trace(U>xx>U), (23.3)
where the trace of a matrix is the sum of its diagonal entries. Since the trace is
a linear operator, this allows us to rewrite Equation (23.2) as follows:
argmax
U∈Rd,n:U>U=I
trace
(
U>
∑m
i=1
xix>iU
)
. (23.4)
LetA =
∑m
i=1xix>i. The matrixAis symmetric and therefore it can be
written using its spectral decomposition asA=VDV>, whereDis diagonal and
V>V=VV>=I. Here, the elements on the diagonal ofDare the eigenvalues of
Aand the columns ofVare the corresponding eigenvectors. We assume without
loss of generality thatD 1 , 1 ≥D 2 , 2 ≥···≥Dd,d. SinceAis positive semidefinite
it also holds thatDd,d≥0. We claim that the solution to Equation (23.4) is
the matrixUwhose columns are theneigenvectors ofAcorresponding to the
largestneigenvalues.
theorem23.2 Letx 1 ,...,xmbe arbitrary vectors inRd, letA=
∑m
i=1xix
>i,
and letu 1 ,...,unbeneigenvectors of the matrixAcorresponding to the largest
neigenvalues ofA. Then, the solution to the PCA optimization problem given
in Equation (23.1) is to setUto be the matrix whose columns areu 1 ,...,un
and to setW=U>.
Proof LetVDV>be the spectral decomposition ofA. Fix some matrixU∈Rd,n
with orthonormal columns and letB=V>U. Then,VB =VV>U=U. It
follows that
U>AU=B>V>VDV>VB=B>DB,
and therefore
trace(U>AU) =
∑d
j=1
Dj,j
∑n
i=1
B^2 j,i.
Note thatB>B=U>VV>U =U>U =I. Therefore, the columns ofBare
also orthonormal, which implies that
∑d
j=1
∑n
i=1B
j,i^2 =n. In addition, letB ̃∈
Rd,dbe a matrix such that its firstncolumns are the columns ofBand in
additionB ̃>B ̃=I. Then, for everyjwe have
∑d
i=1B ̃
j,i^2 = 1, which implies that
∑n
i=1B
(^2) j,i≤1. It follows that:
trace(U>AU) ≤ max
β∈[0,1]d:‖β‖ 1 ≤n
∑d
j=1
Dj,jβj.