over all pairs of websites in our data set. This mean can be found using the
following outline, for ann-by-nmatrix:
1 sum=0
2 for i = 0...n-1
3 for j = i+1...n-1
4 for k = 0...n-1 sum = sum + a[i][k]*a[j][k]
5 mean = sum / (n*(n-1)/2)
Given that our graph could contain thousands—even millions—of web-
sites, our task could entail quite large amounts of computation. A common
approach to dealing with this problem is to divide the computation into
smaller chunks and then process each of the chunks simultaneously, say
on separate computers.
Let’s say that we have two computers at our disposal. We might have one
computer handle all the odd values ofiin thefor iloop in line 2 and have
the second computer handle the even values. Or, since dual-core computers
are fairly standard these days, we could take this same approach on a single
computer. This may sound simple, but a number of major issues can arise, as
you’ll learn in this chapter.
16.2 Introducing the snow Package...............................................
Luke Tierney’ssnow(Simple Network of Workstations) package, available
from the CRAN R code repository, is arguably the simplest, easiest-to-use
form of parallel R and one of the most popular.
NOTE The CRAN Task View page on parallel R,http://cran.r-project.org/web/views/
HighPerformanceComputing.html, has a fairly up-to-date list of available paral-
lel R packages.
To see howsnowworks, here’s code for the mutual outlinks problem
described in the previous section:
1 # snow version of mutual links problem
2
3 mtl <- function(ichunk,m) {
4 n <- ncol(m)
5 matches <- 0
6 for (i in ichunk) {
7 if(i<n){
8 rowi <- m[i,]
9 matches <- matches +
10 sum(m[(i+1):n,] %*% rowi)
11 }
12 }
13 matches
14 }
334 Chapter 16