Social Media Mining: An Introduction

(Axel Boer) #1

P1: qVa Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-03 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 16:45


3.6 Bibliographic Notes 77

Linking between nodes (e.g., befriending in social media) is the most
commonly observed phenomenon in social media. Linking behavior is
analyzed in terms of its transitivity and its reciprocity. Transitivity is “when
a friend of my friend is my friend.” The transitivity of linking behavior
is analyzed by means of the clustering coefficient. The global clustering
coefficient analyzes transitivity within a network, and the local clustering
coefficient performs that for a node. Transitivity is commonly considered
for closed triads of edges. For loops of length 2, the problem is simplified
and is called reciprocity. In other words, reciprocity is when “if you become
my friend, I’ll be yours.”
To analyze if relationships are consistent in social media, we used various
social theories to validate outcomes. Social balance and social status are
two such theories.
Finally, we analyzed node similarity measures. In structural equivalence,
two nodes are considered similar when they share neighborhoods. We dis-
cussed cosine similarity and Jaccard similarity in structural equivalence.
In regular equivalence, nodes are similar when their neighborhoods are
similar.

3.6 Bibliographic Notes

General reviews of different measures in graphs, networks, the web, and
social media can be found in [Newman, 2010;Witten, Frank, and Hall,
2011 ;Tan et al., 2005;Jiawei and Kamber, 2001;Wasserman and Faust,
1994 ].
A more detailed description of the PageRank algorithm can be found in
[Page et al., 1999;Liu, 2007]. In practice, to compute the PageRank values,
thepower iteration methodis used. Given a matrixA, this method produces
an eigenvalueλand an eigenvectorvofA. In the case of PageRank, eigen-
valueλis set to 1. The iterative algorithm starts with an initial eigenvector
v 0 and then,vk+ 1 is computed fromvkas follows,

vk+ 1 =Avk. (3.72)
The iterative process is continued untilvk≈vk+ 1 (i.e., convergence
occurs). Other similar techniques to PageRank for computing influential
nodes in a webgraph, such as the HITS [Kleinberg, 1998] algorithm, can be
found in [Chakrabarti, 2003;Kosala and Blockeel, 2000].Unlike PageRank,
the HITS algorithm^5 considers two types of nodes: authority nodes and hub
nodes. An authority is a webpage that has many in-links. A hub is a page with
many out-links. Authority pages have in-links from many hubs. In other
Free download pdf