has a benefit of reducing the number of hypothesis tests per-
formed for individual-gene-level differential expression analy-
sis) [20, 29]. Therefore, it is convenient to select (on a
per-species basis) the local intermodal minimum as the cutoff
for defining a gene that is expressed in at least one sample
group [30], as shown in the next step.
- For each species, compute the minimum-expression-level cut-
off using kernel density estimation. For the dog dataset, the R
commands and expected output would be
maxexp_density_dog <- density(rsc_maxexp_dog$max_exp)
cutoff_dog <- optimize(approxfun(maxexp_density_dog$x, maxexp_density_dog$y),
interval=c(1,10))$minimum
cutoff_dog
[1] 2.96
(indicating an expression level cutoff of 2.96 for the dog
mRNA-seq dataset). In the above statement, the R function
"density" returns an R object representing the kernel density-
estimated distribution of the function argument, the R func-
tion "approxfun" performs linear interpolation, and the R
function "optimize" finds the point at which a given function
of one or more variables attains a minimum value over a speci-
fied region of the domain of the given function.
- For each species, filter the data matrices to remove any genes
that are not expressed based on the minimum-expression-level
cutoff that was defined above. For the case of the dog dataset,
the R commands and example output would be
0.0
0.2
0.4
0 5 10 15
max_exp
density
species
dog
human
Fig. 1Density distributions of the per-gene maximum (across sample groups) of
the within-sample-group-average log 2 expression level, for the dog and human
mRNA-seq data sets. It is clear that different expression level thresholds apply
for the two mRNA-seq data sets
Cross-Species RNA-Seq Analysis 297