94 A Practical Guide to Cancer Systems Biology
sb<- as.character(sapply(sb, “[”))
names(sb)<-ez
Note that any time if you feel confused about the usage of a function,
say mget, type help(mget) to learn more details about the function mget.
Next, you could get a vector of strings that combine each gene symbol and
corresponding Entrez ID separated by a vertical bar “|”:
sbez<- paste(sb, ez, sep=“/”)
Now, you can replace the old row names in the DATA.ALL matrix with the
new gene names sbez and order the rows according to the new row names
alphabetically:
rownames(DATA.ALL)<-sbez
DATA.ALL<- DATA.ALL[!is.na(ez), ]
DATA.ALL<- DATA.ALL[order(row.names(DATA.ALL)),]
- Taking median values for multiple genes of the
 transformed dataset
After DATA.ALL is transformed into a gene-based matrix, you can next
handle the problem of multiple expression values corresponding to the same
genes. A solution to this problem is usually taking the median expression
values for the same genes in each sample (patient) on the log2-tranformed
basis. Now, you can define a function median4MultiRows that takes an input
matrix with row names and returns an output matrix, where row names are
ordered and unique, that takes into account median values for multiple row
names of the input matrix:
median4MultiRows<-function(mtrx){
- uniqueNames<- unique(rownames(mtrx))
- uniqueNames<- uniqueNames[order(uniqueNames)]
- out<- NULL
- for (nm in uniqueNames)
- out<- rbind(out, apply(mtrx[rownames(mtrx) %in% nm,, drop=F],
2, median))
- rownames(out)<- uniqueNames
- return(out)
+}
Now, you can use this function to resolve the multiple gene name problem:
DATA.ALL<- median4MultiRows(DATA.ALL)
