- For each species, raw counts and normalized counts of aligned
mRNA-seq reads for each gene, based on Ensembl gene iden-
tifiers. Normalized counts can be obtained from the raw counts
using the mRNA-seq analysis R software packages edgeR [27]
(using the "cpm" function) or DESeq2 [20] (using the
"counts" function with the option "normalized¼TRUE").
The mRNA-seq counts for the first species (in the example
analysis vignette for this article, dog) should be contained in a
data frame "rsc_dog" for which the row names are Ensembl
gene identifiers and the column names are sample identifiers.
A portion of the data frame (along with the dimensions of the
data frame) are shown here:
>head(rsc_dog)
TCC.1 TCC.2 TCC.3 TCC.4 TCC.5 TCC.6 TCC.7 normal.1 normal.2 normal.3
ENSCAFG00000014413 4 20 221 11 51 8 18 7 42 27
ENSCAFG00000014412 94 29 64 91 603 271 234 26 126 143
ENSCAFG00000014410 66 15 26 14 200 78 16 14 80 118
ENSCAFG00000014417 0 0 0 0 0 0 0 0 0 0
ENSCAFG00000014416 440 196 202 373 411 766 629 208 605 225
ENSCAFG00000014415 37 27 30 39 244 102 111 25 162 99
>dim(rsc_dog)
[1] 24580 10
In the above example, the R function "dim" gives the row
and column dimensions of its argument (the data frame
"rsc_dog").
- Normalized mRNA-seq counts for the first species should be
contained in a data frame "rsc_norm_dog" with the same row
and column names as "rsc_dog." A portion of the data frame
(along with the dimensions of the data frame) are shown here:
>head(rsc_norm_dog)
TCC.1 TCC.2 TCC.3 TCC.4 TCC.5 TCC.6 TCC.7 normal.1 normal.2
normal.3
ENSCAFG00000014413 4.15 38.30 386.92 16.99 15.42 5.00 13.44 20.73 21.81
19.48
ENSCAFG00000014412 97.44 55.53 112.05 140.53 182.27 169.37 174.69 77.00 65.44
103.18
ENSCAFG00000014410 68.42 28.72 45.52 21.62 60.46 48.75 11.94 41.46 41.55
85.14
ENSCAFG00000014417 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00
ENSCAFG00000014416 456.10 375.33 353.65 576.01 124.24 478.74 469.57 616.00 314.24
162.35
ENSCAFG00000014415 38.35 51.70 52.52 60.23 73.76 63.75 82.87 74.04 84.14
71.43
>dim(rsc_norm_dog)
[1] 24580 10
Raw and normalized mRNA-seq counts for the second
species (in this vignette, human) should be stored in data
frames named "rsc_human" and "rsc_norm_human,"
respectively.
294 Stephen A. Ramsey