P1: Sqe Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-05 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 19:23
132 Data Mining Essentialsthe average distance value between instances in different clusters. In a well-
clustered dataset, the average distance between instances in the same cluster
is small (cohesiveness), and the average distance between instances in dif-
ferent clusters is large (separateness). Leta(x) denote the average distance
between instancexof clusterCand all other members ofC:a(x)=1
|C|− 1
∑
y∈C,y =x||x−y||^2. (5.60)LetG =Cdenote the cluster that is closest toxin terms of the average
distance betweenxand members ofG. Letb(x) denote the average distance
between instancexand instances in clusterG:b(x)=minG =C1
|G|
∑
y∈G||x−y||^2. (5.61)Since we want distance between instances in the same cluster to be
smaller than distance between instances in different clusters, we are inter-
ested ina(x)<b(x). The silhouette clustering index is formulated ass(x)=b(x)−a(x)
max(b(x),a(x)), (5.62)
silhouette=1
n∑
xs(x). (5.63)The silhouette index takes values between [−1, 1]. The best clustering
happens when∀xa(x)b(x). In this case,silhouette≈1. Similarly when
silhouette<0, that indicates that many instances are closer to other clusters
than their assigned cluster, which shows low-quality clustering.Example 5.9.In Figure5.8, the a(.),b(.), and s(.)values area(
x^11)
=|− 10 −(−5)|^2 = 25 (5.64)
b(
x^11)
=
1
2
(|− 10 − 5 |^2 +|− 10 − 10 |^2 )= 312. 5 (5.65)
s(
x^11)
=
312. 5 − 25
312. 5
= 0. 92 (5.66)
a(
x^12 )=|− 5 −(−10)|^2 = 25 (5.67)b(
x^12)
=
1
2
(|− 5 − 5 |^2 +|− 5 − 10 |^2 )= 162. 5 (5.68)
s(
x^12