Social Media Mining: An Introduction

P1: Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-10 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 17:56

282 Behavior Analytics

between the same pair of nodes. For example, consider two authorsx andywho have collaboratedctimes. In this case,|paths<x,^1 y>|=c. Hitting and Commute Time. Consider a random walk that starts at nodexand moves to adjacent nodes uniformly. Hitting timeHx,yis the expected number of random walk steps needed to reachystarting fromx. This is a distance measure. In fact, a smaller hitting time implies a higher similarity; therefore, a negation can turn it into a similarity measure:

σ(x,y)=−Hx,y. (10.12)

Note that if nodeyis highly connected to other nodes in the network (i.e., has a high stationary probabilityπy), then a random walk starting from anyxlikely ends up visitingyearly. Hence, all hitting times toy are very short, and all nodes become similar toy. To account for this, one can normalize hitting time by multiplying it with the stationary probabilityπy:

σ(x,y)=−Hx,yπy. (10.13)

Hitting time is not symmetric, and in general,Hx,y =Hy,x. Thus, one can introduce thecommutetime to mitigate this issue:

σ(x,y)=−(Hx,y+Hy,x). (10.14)

Similarly, commute time can also be normalized,

σ(x,y)=−(Hx,yπy+Hy,xπx). (10.15)

Rooted PageRank. A modified version of the PageRank algorithm can be used to measure similarity between two nodesxandy.In rooted PageRank, we measure the stationary probability of y:πy given the condition that during each random walk step, we jump to xwith probabilityαor a random neighbor with probability 1−α. The matrix format discussed in Chapter 3 can be used to solve this problem. SimRank. One can define similarity between two nodes recursively based on the similarity between their neighbors. In other words, similar nodes have similar neighbors. SimRank performs the following:

σ(x,y)=γ·

∑

x′∈N(x)

∑

y′∈N(y)σ(x′,y′) |N(x)||N(y)|

, (10.16)

Social Media Mining: An Introduction

∑

∑

, (10.16)

Get our desktop app

Company

Features

Documentation

Resources