The actual separation into sound components is done in line 8. Here,
we take a column of romanizations, such the following:
yat1
yuet3
ding1
chat1
naai5
gau2
We split it into three columns, consisting of initial consonant, remainder
of the sound, and tone. For instance,yat1will be split intoy,at, and 1.
This is a very natural candidate for some kind of “apply” function, and
indeedsapply()is used in line 8. Of course, this call requires that we write
a suitable function to be applied. (If we had been lucky, there would have
been an existing R function that worked, but no such good fortune here.)
The function we use issepsoundtone(), starting in line 26.
Thesepsoundtone()function makes heavy use of R’ssubstr()(forsub-
string) function, described in detail in Chapter 11. In line 31, for example,
we loop until we collect all the initial consonants, suchch. The return value,
in line 40, consists of the three sound components extracted from the given
romanized form, the formal parameterpronun.
Note the use of R’s built-in constant,letters, in line 37. We use this to
sense whether a given character is numeric, which means it’s a tone. Some
romanizations are toneless.
Line 8 will then return a 3-by-1 matrix, with one row for each of the
three sound components. We wish to convert this to a data frame for merg-
ing withoutdfin line 19, and we prepare for this in line 10.
Note that we call the matrix transpose functiont()to put our informa-
tion into columns rather than rows. This is needed because data-frame stor-
age is by columns. Also, we include a columnfy[,1], the Chinese characters
themselves, to have a column in common in the call tomerge()in line 19.
Now let’s turn to the code formapsound(), which actually is simpler than
the preceding merging code.
1 mapsound <- function(df,fromcol,tocol,sourceval) {
2 base <- which(df[[fromcol]] == sourceval)
3 basedf <- df[base,]
4 # determine which rows of basedf correspond to the various mapped
5 # values
6 sp <- split(basedf,basedf[[tocol]])
7 retval <- list()
8 retval$counts <- sapply(sp,nrow)
9 retval$images <- sp
10 return(retval)
11 }
Data Frames 119