Social Media Mining: An Introduction

P1: qVa Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-03 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 16:45

72 Network Measures

the network in which they are embedded (i.e.,network similarity) or based on the similarity of the content they generate (i.e.,content similarity). We discuss content similarity in Chapter 5. In this section, we demonstrate ways to compute similarity between two nodes using network information regarding the nodes and edges connecting them. When using network information, the similarity between two nodes can be computed by measuring theirstructural equivalenceor theirregular equivalence.

3.4.1 Structural Equivalence To compute structural equivalence, we look at the neighborhood shared by two nodes; the size of this neighborhood defines how similar two nodes are. For instance, two brothers have in common sisters, mother, father, grandparents, and so on. This shows that they are similar, whereas two random male or female individuals do not have much in common and are not similar. The similarity measures detailed in this section are based on the overlap between the neighborhoods of the nodes. Let N(vi) andN(vj)bethe neighbors of nodesviandvj, respectively. In this case, a measure of node similarity can be defined as follows:

σ(vi,vj)=|N(vi)∩N(vj)|. (3.57)

For large networks, this value can increase rapidly, because nodes may share many neighbors. Generally, similarity is attributed to a value that is bounded and is usually in the range [0,1]. Various normalization procedures JACCARD can take place such as the Jaccard similarity or the cosine similarity: SIMILARITY AND COSINE SIMILARITY σJaccard(vi,vj)=|N(vi)∩N(vj)| |N(vi)∪N(vj)|

, (3.58)

σCosine(vi,vj)=

|N(vi)∩N(vj)| √ |N(vi)||N(vj)|

. (3.59)

In general, the definition of neighborhoodN(vi) excludes the node itself (vi). This leads to problems with the aforementioned similarities because nodes that are connected and do not share a neighbor will be assigned zero similarity. This can be rectified by assuming nodes to be included in their neighborhoods.

Social Media Mining: An Introduction

, (3.58)

. (3.59)

Get our desktop app

Company

Features

Documentation

Resources