Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

316 Clustering


definition22.2 (Unnormalized Graph Laplacian) Theunnormalized graph
Laplacianis them×mmatrixL=D−WwhereDis a diagonal matrix with
Di,i=

∑m
j=1Wi,j. The matrixDis called thedegree matrix.

The following lemma underscores the relation between RatioCut and the Lapla-
cian matrix.

lemma22.3 LetC 1 ,...,Ckbe a clustering and letH∈Rm,kbe the matrix
such that

Hi,j=√|^1 C
j|

(^1) [i∈Cj].
Then, the columns ofHare orthonormal to each other and
RatioCut(C 1 ,...,Ck) = trace(H>LH).
Proof Leth 1 ,...,hkbe the columns ofH. The fact that these vectors are
orthonormal is immediate from the definition. Next, by standard algebraic ma-
nipulations, it can be shown that trace(H>LH) =
∑k
i=1h



iLhiand that for
any vectorvwe have
v>Lv=



1

2

(


r

Dr,rv^2 r− 2


r,s

vrvsWr,s+


s

Ds,svs^2

)

=

1

2


r,s

Wr,s(vr−vs)^2.

Applying this withv=hi and noting that (hi,r−hi,s)^2 is nonzero only if
r∈Ci,s /∈Cior the other way around, we obtain that

h>iLhi =

1

|Ci|


r∈Ci,s/∈Ci

Wr,s.

Therefore, to minimize RatioCut we can search for a matrixHwhose columns
are orthonormal and such that eachHi,jis either 0 or 1/


|Cj|. Unfortunately,
this is an integer programming problem which we cannot solve efficiently. Instead,
we relax the latter requirement and simply search an orthonormal matrixH∈
Rm,kthat minimizes trace(H>LH). As we will see in the next chapter about
PCA (particularly, the proof of Theorem 23.2), the solution to this problem is
to setUto be the matrix whose columns are the eigenvectors corresponding to
thekminimal eigenvalues ofL. The resulting algorithm is called Unnormalized
Spectral Clustering.
Free download pdf