I was particularly interested in three occupations and thus extracted
subdata frames for them to make their analyses more convenient:
se2006 <- all2006[grep("Software Engineer",all2006),]
prg2006 <- all2006[grep("Programmer",all2006),]
ee2006 <- all2006[grep("Electronics Engineer",all2006),]
Here, I used R’sgrep()function to identify the rows containing the given
job title. Details on this function are in Chapter 11.
Another aspect of interest was analysis by firm. I wrote this function to
extract the subdata frame for a given firm:
makecorp <- function(corpname) {
t <- all2006[all2006$Employer_Name == corpname,]
return(t)
}
I then created subdata frames for a number of firms (only some are
shown here).
corplist <- c("MICROSOFT CORPORATION","ms","INTEL CORPORATION","intel","
SUN MICROSYSTEMS, INC.","sun","GOOGLE INC.","google")
for (i in 1:(length(corplist)/2)) {
corp <- corplist[2*i-1]
newdtf <- paste(corplist[2*i],"2006",sep="")
assign(newdtf,makecorp(corp),pos=.GlobalEnv)
}
There’s quite a bit to discuss in the above code. First, note that I want
the variables I’m creating to be at the top (that is, global) level, which is the
usual place one does interactive analysis. Also, I’m creating my new variable
names from character strings, such as “intel2006.” For these reasons, the
assign()function is wonderful. It allows me to assign a variable by its name as
a string and enables me to specify top level (as discussed in Section 7.8.2).
Thepaste()function allows me to concatenate strings, withsep=""speci-
fying that I don’t want any characters between strings in my concatenation.
5.3 Merging Data Frames.......................................................
In the relational database world, one of the most important operations is
that of ajoin, in which two tables can be combined according to the values
of a common variable. In R, two data frames can be similarly combined
using themerge()function.
The simplest form is as follows:
merge(x,y)
Data Frames 109