Systems Biology (Methods in Molecular Biology)

(Tina Sui) #1
has a benefit of reducing the number of hypothesis tests per-
formed for individual-gene-level differential expression analy-
sis) [20, 29]. Therefore, it is convenient to select (on a
per-species basis) the local intermodal minimum as the cutoff
for defining a gene that is expressed in at least one sample
group [30], as shown in the next step.


  1. For each species, compute the minimum-expression-level cut-
    off using kernel density estimation. For the dog dataset, the R
    commands and expected output would be


maxexp_density_dog <- density(rsc_maxexp_dog$max_exp)
cutoff_dog <- optimize(approxfun(maxexp_density_dog$x, maxexp_density_dog$y),
interval=c(1,10))$minimum



cutoff_dog



[1] 2.96


(indicating an expression level cutoff of 2.96 for the dog
mRNA-seq dataset). In the above statement, the R function
"density" returns an R object representing the kernel density-
estimated distribution of the function argument, the R func-
tion "approxfun" performs linear interpolation, and the R
function "optimize" finds the point at which a given function
of one or more variables attains a minimum value over a speci-
fied region of the domain of the given function.


  1. For each species, filter the data matrices to remove any genes
    that are not expressed based on the minimum-expression-level
    cutoff that was defined above. For the case of the dog dataset,
    the R commands and example output would be


0.0

0.2

0.4

0 5 10 15
max_exp

density

species
dog
human

Fig. 1Density distributions of the per-gene maximum (across sample groups) of
the within-sample-group-average log 2 expression level, for the dog and human
mRNA-seq data sets. It is clear that different expression level thresholds apply
for the two mRNA-seq data sets

Cross-Species RNA-Seq Analysis 297
Free download pdf