Systems Biology (Methods in Molecular Biology)

(Tina Sui) #1

  1. Sample group information for the mRNA-seq datasets for each
    species. For each species, the sample group information should
    be contained in a single-column data frame in which the row
    names are unique sample names. A portion of the data frame
    for the example human-dog analysis vignette (along with the
    dimensions of the data frame) is shown here:


>head(dog_sample_info)
external_name
s01 TCC.1
s02 TCC.2
s03 TCC.3
s05 normal.1
s06 TCC.4
s34 TCC.5
>dim(dog_sample_info)
[1] 10 1
Sample information for the other species (in this vignette,
human) should be stored in a similar data frame (in this
vignette, we will assume the data frame is named
"human_sample_info").


  1. Ortholog mappings between the two species, in the form of a
    two-column data frame whose first column contains Ensembl
    gene identifiers for the second species (in this example vignette,
    human) and whose second column contains the Ensembl gene
    identifier of an ortholog (if any) for the gene in the first species
    (in this example vignette, dog). Such a mapping can be
    obtained using Ensembl BioMart. A portion of the data
    frame for the example human-dog analysis vignette (along
    with the dimensions of the data frame) is shown here (see
    Note 2).


>head(human_dog_ensg)
Ensembl.Gene.ID Dog.Ensembl.Gene.ID
1 ENSG00000261657
2 ENSG00000223116
3 ENSG00000233440
4 ENSG00000207157
5 ENSG00000229483
6 ENSG00000252952 ENSCAFG00000025776
>dim(human_dog_ensg)
[1] 65999 2

3 Methods


Below, I outline the steps required to carry out an unsupervised and
a supervised comparison of mRNA-seq data sets from two species,
using as an example mRNA-seq data sets from a cross-species (dog
and human) study of bladder cancer. The first five steps of the

Cross-Species RNA-Seq Analysis 295
Free download pdf