Social Media Mining: An Introduction

(Axel Boer) #1

P1: Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-08 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 17:22


8.1 Measuring Assortativity 223

A

18 21

20

C

B

Figure 8.4. A Correlation Example.

We construct two variablesXLand XR, where for any edge (vi,vj)
we assume thatxiis observed from variableXLandxjis observed from
variableXR. For Figure8.4,

XL=



⎢⎢



18


21


21


20



⎥⎥


⎦, XR=



⎢⎢



21


18


20


21



⎥⎥


⎦. (8.16)


In other words,XLrepresents the ordinal values associated with the left
node of the edges, andXRrepresents the values associated with the right
node of the edges. Our problem is therefore reduced to computing the
covariance between variablesXLandXR. Note that since we are considering
an undirected graph, both edges (vi,vj) and (vj,vi) exist; therefore,xiand
xjare observed in bothXLandXR. Thus,XLandXRinclude the same
set of values but in a different order. This implies thatXLandXRhave the
same mean and standard deviation.

E(XL)=E(XR), (8.17)
σ(XL)=σ(XR). (8.18)

Since we havemedges and each edge appears twice for the undirected
graph, thenXLandXRhave 2melements. Each valuexiappearsditimes
since it appears as endpoints ofdiedges. The covariance betweenXLand
XRis

σ(XL,XR)=E[(XL−E[XL])(XR−E[XR])]
=E[XLXR−XLE[XR]−E[XL]XR+E[XL]E[XR]]
=E[XLXR]−E[XL]E[XR]−E[XL]E[XR]+E[XL]E[XR]
=E[XLXR]−E[XL]E[XR]. (8.19)

E(XL) is the mean (expected value) of variableXL, andE(XLXR)is
the mean of the multiplication ofXLandXR. In our setting and following
Free download pdf