The Art of R Programming

(WallPaper) #1

6.2 Common Functions Used with Factors.........................................


With factors, we have yet another member of the family ofapplyfunctions,
tapply. We’ll look at that function, as well as two other functions commonly
used with factors:split()andby().

6.2.1 The tapply() Function............................................


As motivation, suppose we have a vectorxof ages of voters and a factorf
showing some nonumeric trait of those voters, such as party affiliation
(Democrat, Republican, Unaffiliated). We might wish to find the mean
ages inxwithin each of the party groups.
In typical usage, the calltapply(x,f,g)hasxas a vector,fas a factor or
list of factors, andgas a function. The functiong()in our little example
above would be R’s built-inmean()function. If we wanted to group by both
party and another factor, say gender, we would needfto consist of the two
factors, party and gender.
Each factor infmust have the same length asx. This makes sense in
light of the voter example above; we should have as many party affiliations
as ages. If a component offis a vector, it will be coerced into a factor by
applyingas.factor()to it.
The operation performed bytapply()is to (temporarily) splitxinto
groups, each group corresponding to a level of the factor (or a combina-
tion of levels of the factors in the case of multiple factors), and then apply
g()to the resulting subvectors ofx. Here’s a little example:

> ages <- c(25,26,55,37,21,42)
> affils <- c("R","D","D","R","U","D")
> tapply(ages,affils,mean)
DRU
41 31 21

Let’s look at what happened. The functiontapply()treated the vector
("R","D","D","R","U","D") as a factor with levels"D","R", and"U". It noted that
"D"occurred in indices 2, 3 and 6;"R"occurred in indices 1 and 4; and"U"
occurred in index 5. For convenience, let’s refer to the three index vec-
tors (2,3,6), (1,4), and (5) asx,y, andz, respectively. Thentapply()com-
putedmean(u[x]),mean(u[y]), andmean(u[z])and returned those means in a
three-element vector. And that vector’s element names are"D","R", and"U",
reflecting the factor levels that were used bytapply().
What if we have two or more factors? Then each factor yields a set of
groups, as in the preceding example, and the groups are ANDed together.
As an example, suppose that we have an economic data set that includes vari-
ables for gender, age, and income. Here, the calltapply(x,f,g)might have
xas income andfas a pair of factors: one for gender and the other coding
whether the person is older or younger than 25. We may be interested in

Factors and Tables 123
Free download pdf