P1: WQS Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-06 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 17:15
158 Community Analysis
out ofvirandomly, the probability of this edge getting connected to node
vjis∑dj
idi
= 2 dmj. Because the degree forviisdi,wehavedinumber of
such edges; hence, the expected number of edges betweenviandvjis
didj
2 m. So, given a degree distribution, the expected number of edges between
any pair of vertices can be computed. Real-world communities are far from
random; therefore, the more distant they are from randomly generated com-
MODULARITY munities, the more structure they exhibit. Modularity defines this distance,
and modularity maximization tries to maximize this distance. Consider a
partitioning of the graph G intokpartitions,P=(P 1 ,P 2 ,P 3 ,...,Pk). For
partitionPx, this distance can be defined as
∑
vi,vj∈Px
Aij−
didj
2 m
. (6.20)
This distance can be generalized for partitioningPwithkpartitions,
∑k
x= 1
∑
vi,vj∈Px
Aij−
didj
2 m
. (6.21)
The summation is over all edges (m), and because all edges are counted
twice (Aij=Aji), the normalized version of this distance is defined as
modularityNewman [2006]:
Q=
1
2 m
⎛
⎝
∑k
x= 1
∑
vi,vj∈Px
Aij−
didj
2 m
⎞
⎠. (6.22)
We define the modularity matrix asB=A−ddT/ 2 m, whered∈Rn×^1
is the degree vector for all nodes. Similar to spectral clustering matrix
formulation, modularity can be reformulated as
Q=
1
2 m
Tr(XTBX), (6.23)
whereX∈Rn×kis the indicator (partition membership) function; that is,
Xij=1iff.vi∈Pj. This objective can be maximized such that the best
membership function is extracted with respect to modularity. The problem
is NP-hard; therefore, we relaxXtoXˆthat has an orthogonal structure
(XˆTXˆ=Ik). The optimalXˆcan be computed using the topkeigenvectors
ofBcorresponding to the largest positive eigenvalue. Similar to spectral