Social Media Mining: An Introduction

(Axel Boer) #1

P1: Sqe Trim: 6.125in×9.25in Top: 0.5in Gutter: 0.75in
CUUS2079-05 CUUS2079-Zafarani 978 1 107 01885 3 January 13, 2014 19:23


128 Data Mining Essentials

10
9 8 7 6 5 4 3 2 1 0

12345678910
X

Y

Figure 5.7.k-Means Output on a Sample Dataset. Instances are two-dimensional vectors
shown in the 2-D space.k-means is run withk=6, and the clusters found are visualized
using different symbols.

Clustering validity and the definition of valid clusters are two of the chal-
lenges in the ongoing research.

5.5.1 Clustering Algorithms

There are many clustering algorithm types. In this section, we discussparti-
tionalclustering algorithms, which are the most frequently used clustering
algorithms. In Chapter 6, we discuss two other types of clustering algo-
rithms: spectral clustering and hierarchical clustering.

Partitional Algorithms
Partitional clustering algorithms partition the dataset into a set of clusters.
In other words, each instance is assigned to a cluster exactly once, and no
instance remains unassigned to clusters.k-meansJain and Dubes [1999]
is a well-known example of a partitional algorithm. The output of thek-
means algorithm (k=6) on a sample dataset is shown in Figure5.7.In
this figure, the dataset has two features, and instances can be visualized
in a two-dimensional space. The instances are shown using symbols that
represent the cluster to which they belong. The pseudocode fork-means
algorithm is provided in Algorithm5.2.
Free download pdf