Social Media Mining: An Introduction

P1: qVa Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-03 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 16:45

3.4 Similarity 73

v 5

v 6

v 4

v 1

v 3

v 2

Figure 3.14. Sample Graph for Computing Similarity.

Example 3.14.Consider the graph in Figure3.14. The similarity values between nodesv 2 andv 5 are

σJaccard(v 2 ,v 5 )=

|{v 1 ,v 3 ,v 4 }∩{v 3 ,v 6 }| |{v 1 ,v 3 ,v 4 ,v 6 }|

= 0. 25 , (3.60)

σCosine(v 2 ,v 5 )=

|{v 1 ,v 3 ,v 4 }∩{v 3 ,v 6 }| √ |{v 1 ,v 3 ,v 4 }||{v 3 ,v 6 }|

= 0. 40. (3.61)

A more interesting way of measuring the similarity betweenviandvj is to compareσ(vi,vj) with the expected value ofσ(vi,vj) when nodes pick their neighbors at random. The more distant these two values are, the more significant the similarity observed betweenviandvj(σ(vi,vj)) is. For nodesviandvjwith degreesdianddj, this expectation isdindj, where nis the number of nodes. This is because there is adni chance of becoming vi’s neighbor and, sincevjselectsdjneighbors, the expected overlap isdindj. We can rewriteσ(vi,vj)as

σ(vi,vj)=|N(vi)∩N(vj)|=

∑

k

Ai,kAj,k. (3.62)

Hence, a similarity measure can be defined by subtracting the random expectationdindj from Equation 3.62:

σsignificance(vi,vj)=

∑

k

Ai,kAj,k−

didj n

=

∑

k

Ai,kAj,k−n

1

n

∑

k

Ai,k

1

n

∑

k

Aj,k

=

∑

k

Ai,kAj,k−nA ̄iA ̄j

=

∑

k

(Ai,kAj,k−A ̄iA ̄j)

Social Media Mining: An Introduction

= 0. 25 , (3.60)

= 0. 40. (3.61)

∑

∑

=

∑

1

∑

1

∑

=

∑

=

∑

Get our desktop app

Company

Features

Documentation

Resources