Systems Biology (Methods in Molecular Biology)

(Tina Sui) #1

  1. For each species, map the gene expression data to gene
    function-based expression index levels (specifically, enrichment
    scores based on the Gene Set Enrichment Analysis test statistic
    [24]) using the Gene Set Variation Analysis (GSVA) algorithm
    [13]. This can be done in two R commands:


library(GSEABase)
library(GSVA)
gsc5 <- geneIds(getGmt("c5.all.v5.1.symbols.gmt",
collectionType=BroadCollection(category="c5"),
geneIdType=SymbolIdentifier()))
gsva_dog_c5 <- gsva(data.matrix(rsc_norm_exp_mapped_dog), gset.idx.list=gsc5,
rnaseq=TRUE, method=c("gsva"), verbose=TRUE)



head(gsva_dog_c5$es.obs)
TCC.1 TCC.2 TCC.3 TCC.4 TCC.5 TCC.6 TCC.7 normal.1 normal.2
normal.3
NUCLEOPLASM -0.193 -0.205 0.02 0.13 -0.17 0.12 0.16 -2e-01 -0.033
0.09
EXTRINSIC_TO_PLASMA_MEMB. -0.022 0.049 -0.11 0.24 0.18 0.14 0.31 2e-01 -0.234
-0.25
ORGANELLE_PART -0.206 -0.083 0.04 0.08 -0.05 0.13 0.07 -2e-01 -0.006
0.02
CELL_PROJECTION_PART 0.287 -0.124 -0.07 0.05 0.13 0.08 -0.23 2e-01 0.101
0.28
CYTOPLASMIC_VESICLE_MEMB. -0.008 -0.001 -0.22 0.34 0.19 0.45 0.39 -4e-01 -0.311
-0.37
GOLGI_MEMBRANE 0.122 -0.063 0.01 0.02 0.15 0.14 -0.04 -3e-04 -0.109
-0.17




dim(gsva_dog_c5$es.obs)
[1] 1454 10



In the above example R code, the function "getGmd"
reads in the gene set information from a file in a GMT format;
the function "geneIds" returns the gene set information as a
list; the "data.matrix" function constructs a numeric matrix
from the contents of a data frame; and the function "gsva"
transforms the mRNA-seq normalized log 2 count data to gene
function-level, per-sample enrichment scores using the Gene
Set Variation Analysis method.


  1. Merge the GSVA-transformed expression data matrices from
    the two species together, and merge that data with the sample
    metadata for the datasets for the two species together:


Cross-Species RNA-Seq Analysis 299
Free download pdf