Social Media Mining: An Introduction

(Axel Boer) #1

P1: qVa Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-03 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 16:45


3.4 Similarity 73

v 5

v 6

v 4

v 1

v 3

v 2

Figure 3.14. Sample Graph for Computing Similarity.

Example 3.14.Consider the graph in Figure3.14. The similarity values
between nodesv 2 andv 5 are

σJaccard(v 2 ,v 5 )=

|{v 1 ,v 3 ,v 4 }∩{v 3 ,v 6 }|
|{v 1 ,v 3 ,v 4 ,v 6 }|

= 0. 25 , (3.60)


σCosine(v 2 ,v 5 )=

|{v 1 ,v 3 ,v 4 }∩{v 3 ,v 6 }|

|{v 1 ,v 3 ,v 4 }||{v 3 ,v 6 }|

= 0. 40. (3.61)


A more interesting way of measuring the similarity betweenviandvj
is to compareσ(vi,vj) with the expected value ofσ(vi,vj) when nodes
pick their neighbors at random. The more distant these two values are, the
more significant the similarity observed betweenviandvj(σ(vi,vj)) is.
For nodesviandvjwith degreesdianddj, this expectation isdindj, where
nis the number of nodes. This is because there is adni chance of becoming
vi’s neighbor and, sincevjselectsdjneighbors, the expected overlap isdindj.
We can rewriteσ(vi,vj)as

σ(vi,vj)=|N(vi)∩N(vj)|=


k

Ai,kAj,k. (3.62)

Hence, a similarity measure can be defined by subtracting the random
expectationdindj from Equation 3.62:

σsignificance(vi,vj)=


k

Ai,kAj,k−

didj
n

=



k

Ai,kAj,k−n

1


n


k

Ai,k

1


n


k

Aj,k

=



k

Ai,kAj,k−nA ̄iA ̄j

=



k

(Ai,kAj,k−A ̄iA ̄j)
Free download pdf