Social Media Mining: An Introduction

(Axel Boer) #1

P1: Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-10 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 17:56


280 Behavior Analytics

 Adamic and Adar Measure. A similar measure to Jaccard, this a
measure was introduced by Lada Adamic and Eytan Adar [AA]. The
intuition behind it measure is that if two individuals share a neighbor
and that neighbor is arareneighbor, it should have a higher impact
on their similarity. For instance, we can define the rareness of a node
based on its degree (i.e., the smaller the node’s degree, the higher
its rareness). The original version of the measure is defined based
on webpage features. A modified version based on neighborhood
information is

σ(x,y)=


z∈N(x)∩N(y)

1


log|N(z)|

. (10.5)


 Preferential Attachment. In the preferential attachment model dis-
cussed in Chapter 4, one assumes that nodes of higher degree have
a higher chance of getting connected to incoming nodes. Therefore,
in terms of connection probability, higher degree nodes are simi-
lar. The preferential attachment measure is defined to capture this
similarity:

σ(x,y)=|N(x)|·|N(y)|. (10.6)

Example 10.1. For the graph depicted in Figure 10.5, the similarity
between nodes 5 and 7 based on different neighborhood-based techniques
is

(Common Neighbor)σ(5,7)=|{ 4 , 6 }∩{ 4 }| = 1 (10.7)

(Jaccard)σ(5,7)=

|{ 4 , 6 }∩{ 4 }|


|{ 4 , 6 }∪{ 4 }|


=


1


2


(10.8)


(Adamic and Adar)σ(5,7)=

1


log|{ 5 , 6 , 7 }|

=


1


log 3

(10.9)


(Preferential Attachment)σ(5,7)=|{ 4 }| · |{ 4 , 6 }| = 1 × 2 =2 (10.10)

v 3 v 4 v 7

v 2

v 1 v 6

v 5 v^8

Figure 10.5. Neighborhood-Based Link Prediction Example.
Free download pdf