312 Clustering
parameter. In practice, it is often up to the user of the clustering algorithm to
choose the parameterkthat is most suitable for the given clustering problem.
In the following we describe some of the most common objective functions.
- Thek-means objective functionis one of the most popular clustering
objectives. Ink-means the data is partitioned into disjoint setsC 1 ,...,Ck
where eachCiis represented by a centroidμi. It is assumed that the input
setXis embedded in some larger metric space (X′,d) (so thatX ⊆ X′)
and centroids are members ofX′. Thek-means objective function measures
the squared distance between each point inXto the centroid of its cluster.
The centroid ofCiis defined to be
μi(Ci) = argmin
μ∈X′
∑
x∈Ci
d(x,μ)^2.
Then, thek-means objective is
Gk−means((X,d),(C 1 ,...,Ck)) =
∑k
i=1
∑
x∈Ci
d(x,μi(Ci))^2.
This can also be rewritten as
Gk−means((X,d),(C 1 ,...,Ck)) = min
μ 1 ,...μk∈X′
∑k
i=1
∑
x∈Ci
d(x,μi)^2. (22.1)
Thek-means objective function is relevant, for example, in digital com-
munication tasks, where the members ofXmay be viewed as a collection
of signals that have to be transmitted. WhileXmay be a very large set
of real valued vectors, digital transmission allows transmitting of only a
finite number of bits for each signal. One way to achieve good transmis-
sion under such constraints is to represent each member ofXby a “close”
member of some finite setμ 1 ,...μk, and replace the transmission of any
x∈ Xby transmitting the index of the closestμi. Thek-means objective
can be viewed as a measure of the distortion created by such a transmission
representation scheme.
- Thek-medoids objective functionis similar to thek-means objective,
except that it requires the cluster centroids to be members of the input
set. The objective function is defined by
GK−medoid((X,d),(C 1 ,...,Ck)) =μ min
1 ,...μk∈X
∑k
i=1
∑
x∈Ci
d(x,μi)^2.
- Thek-median objective functionis quite similar to thek-medoids objec-
tive, except that the “distortion” between a data point and the centroid
of its cluster is measured by distance, rather than by the square of the
distance:
GK−median((X,d),(C 1 ,...,Ck)) =μ min
1 ,...μk∈X
∑k
i=1
∑
x∈Ci
d(x,μi).