Social Media Mining: An Introduction

P1: Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-08 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 17:22

8.1 Measuring Assortativity 223

A

18 21

20

C

B

Figure 8.4. A Correlation Example.

We construct two variablesXLand XR, where for any edge (vi,vj) we assume thatxiis observed from variableXLandxjis observed from variableXR. For Figure8.4,

XL=

⎡

⎢⎢

⎣

18

21

20

⎤

⎥⎥

⎦, XR=

⎡

⎢⎢

⎣

21

18

20

21

⎤

⎥⎥

⎦. (8.16)

In other words,XLrepresents the ordinal values associated with the left node of the edges, andXRrepresents the values associated with the right node of the edges. Our problem is therefore reduced to computing the covariance between variablesXLandXR. Note that since we are considering an undirected graph, both edges (vi,vj) and (vj,vi) exist; therefore,xiand xjare observed in bothXLandXR. Thus,XLandXRinclude the same set of values but in a different order. This implies thatXLandXRhave the same mean and standard deviation.

E(XL)=E(XR), (8.17) σ(XL)=σ(XR). (8.18)

Since we havemedges and each edge appears twice for the undirected graph, thenXLandXRhave 2melements. Each valuexiappearsditimes since it appears as endpoints ofdiedges. The covariance betweenXLand XRis

σ(XL,XR)=E[(XL−E[XL])(XR−E[XR])] =E[XLXR−XLE[XR]−E[XL]XR+E[XL]E[XR]] =E[XLXR]−E[XL]E[XR]−E[XL]E[XR]+E[XL]E[XR] =E[XLXR]−E[XL]E[XR]. (8.19)

E(XL) is the mean (expected value) of variableXL, andE(XLXR)is the mean of the multiplication ofXLandXR. In our setting and following

Social Media Mining: An Introduction

XL=

⎡

⎢⎢

⎣

18

21

21

20

⎤

⎥⎥

⎦, XR=

⎡

⎢⎢

⎣

21

18

20

21

⎤

⎥⎥

⎦. (8.16)

Get our desktop app

Company

Features

Documentation

Resources