The Art of R Programming

(WallPaper) #1
another of 20 characters. From A’s point of view, that’s two lines, but to
TCP/IP, it’s just 28 characters of a yet incomplete message. Splitting that
long message back into lines can take a bit of doing. R provides various
functions for this purpose, including the following:


  • readLines()andwriteLines(): These allow you to program as if TCP/IP
    were sending messages line by line, even though this is not actually the
    case. If your application is naturally viewed in terms of lines, these two
    functions can be quite handy.

  • serialize()andunserialize(): You can use these to send R objects, such
    as a matrix or the complex output of a call to a statistical function. The
    object is converted to character string form by the sender and then con-
    verted back to the original object form at the receiver.

  • readBin()andwriteBin(): These are for sending data in binary form.
    (Recall the comment on terminology at the beginning of Section 10.2.2.)


Each of these functions operates on R connections, as you’ll see in the next
example.
It’s important to choose the right function for each job. If you have a
long vector, for example, usingserialize()andunserialize()may be more
convenient but far more time-consuming. This is not only because num-
bers must be converted to and from their character representations but also
because the character representation is typically much longer, which means
greater transmission time.
Here are two other R socket functions:


  • socketConnection(): This establishes an R connection via sockets. You
    specify the port number in the argumentport, and state whether a
    server or client is to be created, by setting the argumentservertoTRUE
    orFALSE, respectively. In the client case, you must also supply the server’s
    IP address in the argumenthost.

  • socketSelect(): This is useful when a server is connected to multiple
    clients. Its main argument,socklist, is a list of connections, and its
    return value is the sublist of connections that have data ready for the
    server to read.


10.3.3 Extended Example: Implementing Parallel R.......................


Some statistical analyses have very long runtimes, so there naturally has been
quite a bit of interest in “parallel R,” in which several R processes cooperate
on a given task. Another possible reason to “go parallel” is memory limita-
tions. If one machine does not have enough memory for the task at hand, it
may help to pool the memories of several machines in some way. Chapter 16
gives an introduction to this important topic.
Sockets play a key role in many parallel R packages. The cooperating R
processes could be either on the same machine or on separate machines. In
the latter case (and even the former), a natural approach to implementing
parallelism is to use R sockets. This is one of the choices in thesnowpackage

248 Chapter 10

Free download pdf