Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1
The Reactome [61, 62] which plays both as an archive of biological
processes by modeling signal transduction, transport, DNA
replication, metabolism, and other cellular processes in an
ordered network of molecular transformations and as a bioin-
formatics tool to discover unexpected functional relationships
in biological data.

On the other hand, the data from above resources can have
different data structures as listed in Table2, which will determine
the direction of follow-up integrative analysis. In mathematical
terms, the data structures of such high-throughput data can usually
take as a vector, a matrix, a tensor, and their combinations (Fig.1).
Simply, any sequence data (e.g., DNA sequencing) can transform to
a (sequenced) vector; each element in a vector represents a nucleic
acid or an amino acid or a modification site on the particular
location of one sequence, e.g., the string consisted of (A,C,G,T)
from 5^0 to 3^0 on DNA sequence [63], or the barcode-like signal of
methylation level on CpG islands along the DNA sequence
[64]. Meantime, the expression data of genes from a large cohort
study can be organized as a matrix, where a row indicates a gene and
a column indicates a sample, so that each element in a matrix
represents one gene’s expression level in one sample, e.g., the
expression of genes in a group of individuals with the same disease
[65] or the gene expression of cell cycle at consecutive time points
[66]. Next, the triple-way biological experiment can produce data
viewed in a cubic form and always be formalized as tensor, and there
are two general types of such data [67]: one is “gene-sample-
source”, which collects the expression data from multiple samples
under several biological conditions, e.g., an element in such tensor
can point the expression level of one gene from one tissue of the
same sample [68]; and the other one is “gene-sample-time”, which
gathers the expression data from a sample at a particular time point,

Table 2
The category of data structure


Data structure Experimental protocol Cases with visualization
Vector Nucleic acid or amino acid The UCSC Genome Browser database [63]
Modification site MEXPRESS visualizing TCGA [64]
Matrix Gene-sample Co-expression of gene profiles [65]
Gene-time AIE for cell cycle pattern [66]
Tensor Gene-sample-source Pan-cancer analysis on TCGA [68]
Gene-sample-time Edge network modeling virus infection [113]
High-order
cube

Gene-sample-source-time Cross-tissue and cross-species transcriptome analysis [70]

114 Xiang-Tian Yu and Tao Zeng

Free download pdf