Social Media Mining: An Introduction

(Axel Boer) #1

P1: WQS Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-06 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 17:15


6.1 Community Detection 155

ithentry on the diagonal ofXTAXrepresents the number of edges that are
inside communityi. Similarly, theith element on the diagonal ofXTDX
represents the number of edges that are connected to members of commu-
nityi. Hence, theithelement on the diagonal ofXT(D−A)Xrepresents
the number of edges that are in the cut that separates communityifrom all
other nodes. In fact, theith diagonal element ofXT(D−A)Xis equivalent
to the summation termcut(Pi,P ̄i) in both the ratio and normalized cut.
Thus, for ratio cut, we have

Ratio Cut(P)=

1


k

∑k

i= 1

cut(Pi,P ̄i)
|Pi|

(6.9)


=


1


k

∑k

i= 1

XiT(D−A)Xi
XTiXi

(6.10)


=


1


k

∑k

i= 1

XˆTi(D−A)Xˆi, (6.11)

whereXˆi=Xi/(XiTXi)^1 /^2. A similar approach can be followed to for-
mulate the normalized cut and to obtain a differentXˆi. To formulate the
summation in both the ratio and normalized cut, we can use the trace of
matrix (tr(Xˆ)=

∑n
i= 1 xˆii). Using the trace, the objectives for both the ratio
and normalized cut can be formulated as trace-minimization problems,

min

Tr(XˆTLXˆ), (6.12)

whereLis the(normalized) graph Laplacian, defined as follows:

L=


{


D−A Ratio Cut Laplacian (Unnormalized Laplacian);
I−D−^1 /^2 AD−^1 /^2 Normalized Cut Laplacian (Normalized Laplacian).

(6.13)

It has been shown that both ratio cut and normalized cut minimization are NORMALIZED
AND
UNNORMALIZED
GRAPH
LAPLACIAN

NP-hard; therefore, approximation algorithms using relaxations are desired.
Spectral clustering is one such relaxation:

min

Tr(XˆTLXˆ), (6.14)

s.t. XˆTXˆ=Ik. (6.15)
Free download pdf