The Art of R Programming

(WallPaper) #1
But what if you want to search for a period usinggrep()? Here’s the
naive approach:

> grep(".",c("abc","de","f.g"))
[1]123

The result should have been 3 , not(1,2,3). This call failed because peri-
ods are metacharacters. You need toescapethe metacharacter nature of the
period, which is done via a backslash:

> grep("\\.",c("abc","de","f.g"))
[1] 3

Now, didn’t I sayabackslash? Then why are there two? Well, the sad
truth is that the backslash itself must be escaped, which is accomplished by
its own backslash! This goes to show how arcanely complex regular expres-
sions can become. Indeed, a number of books have been written on the sub-
ject of regular expressions (for various programming languages). As a start
in learning about the topic, refer to R’s online help (type?regex).

11.2.1 Extended Example: Testing a Filename for a Given Suffix...........


Suppose we wish to test for a specified suffix in a filename. We might, for
instance, want to find all HTML files (those with suffix.html,.htm, and so
on). Here is code for that:

1 testsuffix <- function(fn,suff) {
2 parts <- strsplit(fn,".",fixed=TRUE)
3 nparts <- length(parts[[1]])
4 return(parts[[1]][nparts] == suff)
5 }


Let’s test it.

> testsuffix("x.abc","abc")
[1] TRUE
> testsuffix("x.abc","ac")
[1] FALSE
> testsuffix("x.y.abc","ac")
[1] FALSE
> testsuffix("x.y.abc","abc")
[1] TRUE

How does the function work? First note that the call tostrsplit()on
line 2 returns a list consisting of one element (becausefnis a one-element
vector)—a vector of strings. For example, callingtestsuffix("x.y.abc","abc")
will result inpartsbeing a list consisting of a three-element vector with ele-
mentsx,y, andabc. We then pick up the last element and compare it tosuff.

String Manipulation 255
Free download pdf