The Art of R Programming

(WallPaper) #1
discuss this in detail when we talk about general performance considerations
in in Section 16.4.1.

16.2.4 Extended Example: K-Means Clustering...........................


To learn more about the capabilities ofsnow, we’ll look at another example,
this one involving k-means clustering (KMC).
KMC is a technique for exporatory data analysis. In looking at scatter
plots of your data, you may have the perception that the observations tend
to cluster into groups, and KMC is a method for finding such groups. The
output consists of the centroids of the groups.
The following is an outline of the algorithm:

1 for iter = 1,2,...,niters
2 set vector and count totals to 0
3 for i = 1,...,nrow(m)
4 set j = index of the closest group center to m[i,]
5 add m[i,] to the vector total for group j, v[j]
6 add 1 to the count total for group j, c[j]
7 for j = 1,...,ngrps
8 set new center of group j = v[j] / c[j]

Here, we specifynitersiterations, withinitcentersas our initial guesses
for the centers of the groups. Our data is in the matrixm, and there arengrps
groups.
The following is thesnowcode to compute KMC in parallel:

1 # snow version of k-means clustering problem
2
3 library(snow)
4
5 # returns distances from x to each vector in y;
6 # here x is a single vector and y is a bunch of them;
7 # define distance between 2 points to be the sum of the absolute values
8 # of their componentwise differences; e.g., distance between (5,4.2) and
9 # (3,5.6) is2+1.4=3.4
10 dst <- function(x,y) {
11 tmpmat <- matrix(abs(x-y),byrow=T,ncol=length(x)) # note recycling
12 rowSums(tmpmat)
13 }
14
15 # will check this worker's mchunk matrix against currctrs, the current
16 # centers of the groups, returning a matrix; row j of the matrix will
17 # consist of the vector sum of the points in mchunk closest to jth
18 # current center, and the count of such points
19 findnewgrps <- function(currctrs) {
20 ngrps <- nrow(currctrs)
21 spacedim <- ncol(currctrs) # what dimension space are we in?

338 Chapter 16

Free download pdf