Understanding Machine Learning: From Theory to Algorithms

316 Clustering

definition22.2 (Unnormalized Graph Laplacian) Theunnormalized graph Laplacianis them×mmatrixL=D−WwhereDis a diagonal matrix with Di,i=

∑m j=1Wi,j. The matrixDis called thedegree matrix.

The following lemma underscores the relation between RatioCut and the Lapla- cian matrix.

lemma22.3 LetC 1 ,...,Ckbe a clustering and letH∈Rm,kbe the matrix such that

Hi,j=√|^1 C j|

(^1) [i∈Cj].
Then, the columns ofHare orthonormal to each other and
RatioCut(C 1 ,...,Ck) = trace(H>LH).
Proof Leth 1 ,...,hkbe the columns ofH. The fact that these vectors are
orthonormal is immediate from the definition. Next, by standard algebraic ma-
nipulations, it can be shown that trace(H>LH) =
∑k
i=1h

iLhiand that for
any vectorvwe have
v>Lv=

1

2

(

∑

r

Dr,rv^2 r− 2

∑

r,s

vrvsWr,s+

∑

s

Ds,svs^2

)

=

1

2

∑

r,s

Wr,s(vr−vs)^2.

Applying this withv=hi and noting that (hi,r−hi,s)^2 is nonzero only if r∈Ci,s /∈Cior the other way around, we obtain that

h>iLhi =

1

|Ci|

∑

r∈Ci,s/∈Ci

Wr,s.

Therefore, to minimize RatioCut we can search for a matrixHwhose columns are orthonormal and such that eachHi,jis either 0 or 1/

√

|Cj|. Unfortunately, this is an integer programming problem which we cannot solve efficiently. Instead, we relax the latter requirement and simply search an orthonormal matrixH∈ Rm,kthat minimizes trace(H>LH). As we will see in the next chapter about PCA (particularly, the proof of Theorem 23.2), the solution to this problem is to setUto be the matrix whose columns are the eigenvectors corresponding to thekminimal eigenvalues ofL. The resulting algorithm is called Unnormalized Spectral Clustering.

Understanding Machine Learning: From Theory to Algorithms

1

2

(

∑

∑

∑

)

=

1

2

∑

1

∑

√

Get our desktop app

Company

Features

Documentation

Resources