The Art of R Programming

(WallPaper) #1
We can sort by word frequency in a similar manner.

1 # orders the output of findwords() by word frequency
2 freqwl <- function(wrdlst) {
3 freqs <- sapply(wrdlst,length) # get word frequencies
4 return(wrdlst[order(freqs)])
5 }


In line 3, we are using the fact that each element inwrdlstis a vector
of numbers representing the positions in our input file at which the given
word is found. Callinglength()on that vector gives us the number of times
the given word appears in that file. The result of callingsapply()will be the
vector of word frequencies.
We could usesort()here again, butorder()is more direct. This latter
function returns the indices of a sorted vector with respect to the original
vector. Here’s an example:

> x <- c(12,5,13,8)
> order(x)
[1]2413

The output here indicates thatx[2]is the smallest element inx,x[4]
is the second smallest, and so on. In our case, we useorder()to determine
which word is least frequent, second least frequent, and so on. Plugging
these indices into our word list gives us the same word list but in order of
frequency.
Let’s check it.

> freqwl(wl)
$here
[1] 2

$means
[1] 3

$first
[1] 6
...
$that
[1] 4 40

$`in`
[1] 8 15

$line
[1] 10 24
...
$this

Lists 97
Free download pdf