Going one step further, we could save these groups in a list, like this:
grps <- list()
for (gen in c("M","F","I")) grps[[gen]] <- which(g==gen)
grps
$M
[1]156
$F
[1]237
$I
[1] 4
Note that we take advantage of the fact that R’sfor()loop has the ability
to loop through a vector of strings. (You’ll see a more efficient approach in
Section 4.4.)
We might use our recoded data to draw some graphs, exploring the vari-
ous variables in the abalone data set. Let’s summarize the nature of the vari-
ables by adding the following header to the file:
Gender,Length,Diameter,Height,WholeWt,ShuckedWt,ViscWt,ShellWt,Rings
We could, for instance, plot diameter versus length, with a separate plot
for males and females, using the following code:
aba <- read.csv("abalone.data",header=T,as.is=T)
grps <- list()
for (gen in c("M","F")) grps[[gen]] <- which(aba==gen)
abam <- aba[grps$M,]
abaf <- aba[grps$F,]
plot(abam$Length,abam$Diameter)
plot(abaf$Length,abaf$Diameter,pch="x",new=FALSE)
First, we read in the data set, assigning it to the variableaba(to remind
us that it’s abalone data). The call toread.csv()is similar to theread.table()
call we used in Chapter 1, as we’ll discuss in Chapters 6 and 10. We then
formabamandabaf, the submatrices ofabacorresponding to males and
females, respectively.
Next, we create the plots. The first call does a scatter plot of diameter
against length for the males. The second call is for the females. Since we
want this plot to be superimposed on the same graph as the males, we set
the argumentnew=FALSE, instructing R tonotcreate a new graph. The argu-
mentpch="x"means that we want the plot characters for the female graph
to consist ofxcharacters, rather than the defaultocharacters.
The graph (for the entire data set) is shown in Figure 2-1. By the way, it
is not completely satisfactory. Apparently, there is such a strong correlation
between diameter and length that the points densely fill up a section of the
Vectors 53