Social Media Mining: An Introduction

(Axel Boer) #1

P1: qVa Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-03 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 16:45


72 Network Measures

the network in which they are embedded (i.e.,network similarity) or based
on the similarity of the content they generate (i.e.,content similarity). We
discuss content similarity in Chapter 5. In this section, we demonstrate
ways to compute similarity between two nodes using network information
regarding the nodes and edges connecting them. When using network infor-
mation, the similarity between two nodes can be computed by measuring
theirstructural equivalenceor theirregular equivalence.

3.4.1 Structural Equivalence
To compute structural equivalence, we look at the neighborhood shared by
two nodes; the size of this neighborhood defines how similar two nodes
are. For instance, two brothers have in common sisters, mother, father,
grandparents, and so on. This shows that they are similar, whereas two
random male or female individuals do not have much in common and are
not similar.
The similarity measures detailed in this section are based on the overlap
between the neighborhoods of the nodes. Let N(vi) andN(vj)bethe
neighbors of nodesviandvj, respectively. In this case, a measure of node
similarity can be defined as follows:

σ(vi,vj)=|N(vi)∩N(vj)|. (3.57)

For large networks, this value can increase rapidly, because nodes may
share many neighbors. Generally, similarity is attributed to a value that is
bounded and is usually in the range [0,1]. Various normalization procedures
JACCARD can take place such as the Jaccard similarity or the cosine similarity:
SIMILARITY
AND COSINE
SIMILARITY σJaccard(vi,vj)=|N(vi)∩N(vj)|
|N(vi)∪N(vj)|

, (3.58)


σCosine(vi,vj)=

|N(vi)∩N(vj)|

|N(vi)||N(vj)|

. (3.59)


In general, the definition of neighborhoodN(vi) excludes the node itself
(vi). This leads to problems with the aforementioned similarities because
nodes that are connected and do not share a neighbor will be assigned zero
similarity. This can be rectified by assuming nodes to be included in their
neighborhoods.
Free download pdf