Social Media Mining: An Introduction

P1: Sqe Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-05 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 19:23

132 Data Mining Essentials

the average distance value between instances in different clusters. In a well- clustered dataset, the average distance between instances in the same cluster is small (cohesiveness), and the average distance between instances in different clusters is large (separateness). Leta(x) denote the average distance between instancexof clusterCand all other members ofC:

a(x)=

1

|C|− 1

∑

y∈C,y =x

||x−y||^2. (5.60)

LetG =Cdenote the cluster that is closest toxin terms of the average distance betweenxand members ofG. Letb(x) denote the average distance between instancexand instances in clusterG:

b(x)=minG =C

1

|G|

∑

y∈G

||x−y||^2. (5.61)

Since we want distance between instances in the same cluster to be smaller than distance between instances in different clusters, we are inter- ested ina(x)<b(x). The silhouette clustering index is formulated as

s(x)=

b(x)−a(x) max(b(x),a(x))

, (5.62)

silhouette=

1

n

∑

x

s(x). (5.63)

The silhouette index takes values between [−1, 1]. The best clustering happens when∀xa(x)b(x). In this case,silhouette≈1. Similarly when silhouette<0, that indicates that many instances are closer to other clusters than their assigned cluster, which shows low-quality clustering.

Example 5.9.In Figure5.8, the a(.),b(.), and s(.)values are

a

(

x^11

)

=|− 10 −(−5)|^2 = 25 (5.64)

b

(

x^11

)

=

1

2

(|− 10 − 5 |^2 +|− 10 − 10 |^2 )= 312. 5 (5.65)

s

(

x^11

)

=

312. 5 − 25

312. 5

= 0. 92 (5.66)

a

(

x^12 )=|− 5 −(−10)|^2 = 25 (5.67)

b

(

x^12

)

=

1

2

(|− 5 − 5 |^2 +|− 5 − 10 |^2 )= 162. 5 (5.68)

s

(

x^12

)

=

162. 5 − 25

162. 5

= 0. 84 (5.69)

Social Media Mining: An Introduction

1

|C|− 1

∑

1

|G|

∑

, (5.62)

1

∑

(

)

=|− 10 −(−5)|^2 = 25 (5.64)

(

)

=

1

2

(|− 10 − 5 |^2 +|− 10 − 10 |^2 )= 312. 5 (5.65)

(

)

=

312. 5 − 25

312. 5

= 0. 92 (5.66)

(

(

)

=

1

2

(|− 5 − 5 |^2 +|− 5 − 10 |^2 )= 162. 5 (5.68)

(

)

=

162. 5 − 25

162. 5

= 0. 84 (5.69)

Get our desktop app

Company

Features

Documentation

Resources