A Practical Guide to Cancer Systems Biology

(nextflipdebug2) #1

94 A Practical Guide to Cancer Systems Biology



sb<- as.character(sapply(sb, “[”))
names(sb)<-ez



Note that any time if you feel confused about the usage of a function,
say mget, type help(mget) to learn more details about the function mget.
Next, you could get a vector of strings that combine each gene symbol and
corresponding Entrez ID separated by a vertical bar “|”:



sbez<- paste(sb, ez, sep=“/”)



Now, you can replace the old row names in the DATA.ALL matrix with the
new gene names sbez and order the rows according to the new row names
alphabetically:



rownames(DATA.ALL)<-sbez
DATA.ALL<- DATA.ALL[!is.na(ez), ]
DATA.ALL<- DATA.ALL[order(row.names(DATA.ALL)),]




  1. Taking median values for multiple genes of the
    transformed dataset


After DATA.ALL is transformed into a gene-based matrix, you can next
handle the problem of multiple expression values corresponding to the same
genes. A solution to this problem is usually taking the median expression
values for the same genes in each sample (patient) on the log2-tranformed
basis. Now, you can define a function median4MultiRows that takes an input
matrix with row names and returns an output matrix, where row names are
ordered and unique, that takes into account median values for multiple row
names of the input matrix:



median4MultiRows<-function(mtrx){



  • uniqueNames<- unique(rownames(mtrx))

  • uniqueNames<- uniqueNames[order(uniqueNames)]

  • out<- NULL

  • for (nm in uniqueNames)

  • out<- rbind(out, apply(mtrx[rownames(mtrx) %in% nm,, drop=F],
    2, median))

  • rownames(out)<- uniqueNames

  • return(out)
    +}



Now, you can use this function to resolve the multiple gene name problem:



DATA.ALL<- median4MultiRows(DATA.ALL)


Free download pdf