The Art of R Programming

(WallPaper) #1
The sorting approach in line 7, which makes use oforder(), is the stan-
dard way to sort a data frame (worth remembering, since the situation arises
rather frequently).
The approach taken here—converting a table to a data frame—could
also be used in Section 6.3.2. However, you would need to be careful to re-
move levels from the factors to avoid zeros in cells.

6.4 Other Factor- and Table-Related Functions.....................................


R includes a number of other functions that are handy for working with
tables and factors. We’ll discuss two of them here:aggregate()andcut().

NOTE Hadley Wickham’sreshapepackage “lets you flexibly restructure and aggregate data
using just two functions:meltandcast.” This package may take a while to learn,
but it is extremely powerful. Hisplyrpackage is also quite versatile. You can down-
load both packages from R’s CRAN repository. See Appendix B for more details about
downloading and installing packages.

6.4.1 The aggregate() Function........................................


Theaggregate()function callstapply()once for each variable in a group. For
example, in the abalone data, we could find the median of each variable,
broken down by gender, as follows:

> aggregate(aba[,-1],list(aba$Gender),median)
Group.1 Length Diameter Height WholeWt ShuckedWt ViscWt ShellWt Rings
1 F 0.590 0.465 0.160 1.03850 0.44050 0.2240 0.295 10
2 I 0.435 0.335 0.110 0.38400 0.16975 0.0805 0.113 8
3 M 0.580 0.455 0.155 0.97575 0.42175 0.2100 0.276 10

The first argument,aba[,-1], is the entire data frame except for the first
column, which isGenderitself. The second argument, which must be a list,
is ourGenderfactor as before. Finally, the third argument tells R to compute
the median on each column in each of the data frames generated by the
subgrouping corresponding to our factors. There are three such subgroups
in our example here and thus three rows in the output ofaggregate().

6.4.2 The cut() Function...............................................


A common way to generate factors, especially for tables, is thecut()func-
tion. You give it a data vectorxand a set of bins defined by a vectorb. The
function then determines which bin each of the elements ofxfalls into.
The following is the form of the call we’ll use here:

y <- cut(x,b,labels=FALSE)

136 Chapter 6

Free download pdf