P1: qVa Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-03 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 16:45
3.4 Similarity 73v 5v 6v 4v 1v 3v 2Figure 3.14. Sample Graph for Computing Similarity.Example 3.14.Consider the graph in Figure3.14. The similarity values
between nodesv 2 andv 5 areσJaccard(v 2 ,v 5 )=|{v 1 ,v 3 ,v 4 }∩{v 3 ,v 6 }|
|{v 1 ,v 3 ,v 4 ,v 6 }|= 0. 25 , (3.60)
σCosine(v 2 ,v 5 )=|{v 1 ,v 3 ,v 4 }∩{v 3 ,v 6 }|
√
|{v 1 ,v 3 ,v 4 }||{v 3 ,v 6 }|= 0. 40. (3.61)
A more interesting way of measuring the similarity betweenviandvj
is to compareσ(vi,vj) with the expected value ofσ(vi,vj) when nodes
pick their neighbors at random. The more distant these two values are, the
more significant the similarity observed betweenviandvj(σ(vi,vj)) is.
For nodesviandvjwith degreesdianddj, this expectation isdindj, where
nis the number of nodes. This is because there is adni chance of becoming
vi’s neighbor and, sincevjselectsdjneighbors, the expected overlap isdindj.
We can rewriteσ(vi,vj)asσ(vi,vj)=|N(vi)∩N(vj)|=∑
kAi,kAj,k. (3.62)Hence, a similarity measure can be defined by subtracting the random
expectationdindj from Equation 3.62:σsignificance(vi,vj)=∑
kAi,kAj,k−didj
n=
∑
kAi,kAj,k−n1
n∑
kAi,k1
n∑
kAj,k=
∑
kAi,kAj,k−nA ̄iA ̄j=
∑
k(Ai,kAj,k−A ̄iA ̄j)