Systems Biology (Methods in Molecular Biology)

(Tina Sui) #1
genes_exp_dog <- rsc_maxexp_dog$gene[which(rsc_maxexp_dog$max_exp >= cutoff_dog)]
rsc_exp_dog <- rsc_dog[genes_exp_dog, ]
rsc_norm_exp_dog <- rsc_norm_dog[genes_exp_dog, ]
>head(rsc_norm_exp_dog)
TCC.1 TCC.2 TCC.3 TCC.4 TCC.5 TCC.6 TCC.7 normal.1 normal.2
normal.3
ENSCAFG00000014413 4.146 38.30 386.92 16.99 15.42 5.00 13.44 20.73 21.81
19.48
ENSCAFG00000014412 97.440 55.53 112.05 140.53 182.27 169.37 174.69 77.00 65.44
103.18
ENSCAFG00000014410 68.416 28.72 45.52 21.62 60.46 48.75 11.94 41.46 41.55
85.14
ENSCAFG00000014416 456.104 375.33 353.65 576.01 124.24 478.74 469.57 616.00 314.24
162.35
ENSCAFG00000014415 38.354 51.70 52.52 60.23 73.76 63.75 82.87 74.04 84.14
71.43
ENSCAFG00000020948 572.204 712.36 1442.62 237.82 29.93 204.37 431.50 796.66 382.80
324.69
>dim(rsc_norm_exp_dog)
[1] 13572 10


  1. For each species, select only the genes that are expressed (as in
    step 3) and whose Ensembl gene identifier maps to a HGNC
    gene symbol, including normalized expression data in log 2
    scale. For cases where multiple Ensembl gene identifiers map
    to a single HGNC gene symbol, average the gene expression
    data in log 2 scale. For the case of the dog dataset, the R code
    and example output would be


rsc_norm_exp_mapped_dog <- data.frame(
aggregate(. ~ Associated.Gene.Name,
data=merge(dog_ensgene_to_symbol[which(dog_ensgene_to_symbol$Associated.Gene.Name != ""),
, drop=FALSE],
rsc_norm_exp_dog, by.x=0, by.y=0)[,-1],
FUN=function(expvals) {mean(log2(expvals+1))}),
row.names=1)
>head(rsc_norm_exp_mapped_dog)
TCC.1 TCC.2 TCC.3 TCC.4 TCC.5 TCC.6 TCC.7 normal.1 normal.2 normal.3
5S_rRNA 1.050 4.302 3.509 3.793 0.7252 1.4334 1.436 3.957 4.023 2.957
7SK 4.045 4.216 4.017 4.733 4.1991 4.1246 4.722 4.090 4.226 3.943
A2M 9.674 9.387 10.456 5.063 8.7988 4.1598 4.453 8.441 9.698 10.338
A4GALT 5.779 4.333 4.673 5.128 4.5833 4.3923 4.297 5.982 4.096 5.001
AAAS 4.908 5.148 5.171 4.768 5.8385 5.2668 4.762 3.983 4.261 5.155
AADAC 0.000 3.114 2.644 0.000 0.3810 0.7004 0.000 3.683 3.751 3.879
>dim(rsc_norm_exp_mapped_dog)
[1] 11703 10
In the above statement, the R function "merge" is used to
combine data frames containing the mRNA-seq data and con-
taining mappings between Ensembl gene identifiers and
HGNC gene symbols; the R function "aggregate" is used to
compute the per-sample average expression level for all
Ensembl genes that map to a given HGNC gene symbol; and
the function "data.frame" is here used to construct a new data
frame from a given data frame, taking the first column of the
given data frame as row names for the new data frame.

298 Stephen A. Ramsey

Free download pdf