Systems Biology (Methods in Molecular Biology)

(Tina Sui) #1
ortholog pairs across species [10], but that approach has been
criticized as insufficiently accounting for species-specific effects
[11, 12]. Another challenge that arises in cross-species mRNA-
seq analysis is that the baseline count level for a gene to be consid-
ered above-background canvary from data settodata set, such thata
pan-species expression level cutoff to eliminate low-expressed genes
[10, 11] may not be optimal. In this chapter, I describe a novel
approach to cross-species RNA-seq analysis that circumvents the
above problems by (1) using a kernel density estimation approach
to select the normalized count cutoffs for low-expressed genes on a
per-species basis, (2) reducing mRNA-seq data from gene-level tran-
script abundances togene-function-level indices of transcriptional
activity (using mappings of human genes to gene functional annota-
tions and the Gene Set Variation Analysis technique [13]), and
(3) comparing the gene function-level indices across species. Using
mRNA-seq data from a comparative oncology study of human and
dog bladder cancer [3], I illustrate in a step-by-step fashion (using
the example code in the R programming language) how this method
can enable unsupervised cross-species mRNA-seq analysis as well as
enable supervised cross-species mRNA-seq comparisons.

2 Materials


The example mRNA-seq data sets shown in this section are from a
cross-species (human and dog) study of bladder cancer, specifically
transitional cell carcinoma (TCC) of the bladder [3]. The raw
sequence data sets for the dog samples (TCC and normal bladder)
are publicly available in the National Center for Biotechnology
Information Gene Expression Omnibus database under accession
number PRJNA339175. The human bladder cancer mRNA-seq
data are publicly available from the Cancer Genome Atlas
(TCGA) project through the Data Portal of the Genomic Data
Commons website [14] at the National Cancer Institute (TCGA-
BLCA mRNA-seq data set). The human bladder cancer mRNA-seq
data files were obtained from under an approved Data Use Request
for General Research Use (dbGaP project # 8059, approval #
34645) and processed to produce a matrix of per-gene/per-sample
raw counts as described in [3]. In this section, I list the software
tools and loaded data tables that would be required to carry out a
cross-species analysis of mRNA-seq data through this workflow,
using data from the bladder cancer cross-species study for illustra-
tion purposes. Completing this analysis workflow will require the
following:


  1. An installation of the R computing environment [15], which is
    a free and open-source implementation of the R statistical
    computing language [16] with an integrated software package


292 Stephen A. Ramsey

Free download pdf