Social Media Mining: An Introduction

(Axel Boer) #1

P1: Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-08 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 17:22


220 Influence and Homophily

This measure has its limitations. Consider a school of Hispanic students.
Obviously, all connections will be between Hispanics, and assortativity
value 1 is not a significant finding. However, consider a school where half
the population is white and half the population is Hispanic. It is statisti-
cally expected that 50% of the connections will be between members of
different race. If connections in this school were only between whites and
Hispanics, then our finding is significant. To account for this issue, we can
ASSORTATIVITYemploy a common technique where we measure theassortativity signifi-
SIGNIFICANCE canceby subtracting the measured assortativity by the statistically expected
assortativity. The higher this value, the more significant the assortativity
observed.
Consider a graphG(V,E),|E|=m, where the degrees are known
beforehand (how many friends an individual has), but the edges are not.
Consider two nodesviandvj, with degreesdianddj, respectively. What is
the expected number of edges between these two nodes? Consider nodevi.
For any edge going out ofvirandomly, the probability of this edge getting
connected to nodevjis∑dj
idi

= 2 dmj. Since the degree forviisdi,wehave
disuch edges; hence, the expected number of edges betweenviandvjis
didj
2 m. Now, the expected number of edges betweenviandvjthat are of the
same type isd 2 idmjδ(t(vi),t(vj) ) and the expected number of edges of the
same type in the whole graph is

1


m


(vi,vj)∈E

didj
2 m

δ(t(vi),t(vj))=

1


2 m


ij

didj
2 m

δ(t(vi),t(vj)). (8.3)

We are interested in computing the distance between the assortativity
observed and the expected assortativity:

Q=


1


2 m


ij

Aijδ(t(vi),t(vj))−

1


2 m


ij

didj
2 m

δ(t(vi),t(vj) ) (8.4)

=


1


2 m


ij

(Aij−

didj
2 m

)δ(t(vi),t(vj)). (8.5)

MODULARITY This measure is calledmodularityNewman [2006]. The maximum mod-
ularity value for a network depends on the number of nodes of the same type
and degree. The maximum occurs when all edges are connecting nodes of
Free download pdf