Social Media Mining: An Introduction

(Axel Boer) #1

P1: WQS Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-06 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 17:15


158 Community Analysis

out ofvirandomly, the probability of this edge getting connected to node
vjis∑dj
idi

= 2 dmj. Because the degree forviisdi,wehavedinumber of
such edges; hence, the expected number of edges betweenviandvjis
didj
2 m. So, given a degree distribution, the expected number of edges between
any pair of vertices can be computed. Real-world communities are far from
random; therefore, the more distant they are from randomly generated com-
MODULARITY munities, the more structure they exhibit. Modularity defines this distance,
and modularity maximization tries to maximize this distance. Consider a
partitioning of the graph G intokpartitions,P=(P 1 ,P 2 ,P 3 ,...,Pk). For
partitionPx, this distance can be defined as

vi,vj∈Px

Aij−

didj
2 m

. (6.20)


This distance can be generalized for partitioningPwithkpartitions,

∑k

x= 1


vi,vj∈Px

Aij−

didj
2 m

. (6.21)


The summation is over all edges (m), and because all edges are counted
twice (Aij=Aji), the normalized version of this distance is defined as
modularityNewman [2006]:

Q=


1


2 m



∑k

x= 1


vi,vj∈Px

Aij−

didj
2 m


⎠. (6.22)


We define the modularity matrix asB=A−ddT/ 2 m, whered∈Rn×^1
is the degree vector for all nodes. Similar to spectral clustering matrix
formulation, modularity can be reformulated as

Q=


1


2 m

Tr(XTBX), (6.23)

whereX∈Rn×kis the indicator (partition membership) function; that is,
Xij=1iff.vi∈Pj. This objective can be maximized such that the best
membership function is extracted with respect to modularity. The problem
is NP-hard; therefore, we relaxXtoXˆthat has an orthogonal structure
(XˆTXˆ=Ik). The optimalXˆcan be computed using the topkeigenvectors
ofBcorresponding to the largest positive eigenvalue. Similar to spectral
Free download pdf