Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1
kernel parameters for each data type beforehand [27]. And in a
biological application, the high-throughput screens for mRNA,
miRNA, and proteins have been jointly analyzed using factor
analysis, combined with linear discriminant analysis (LDA), to
identify the molecular characteristics of cancer [22]. Especially
when focused on characterizing biological network, an algo-
rithm JointCluster is implemented to find sets of genes that
cluster well in multiple networks of interest, such as
co-expression networks summarizing correlations among the
expression profiles of genes and physical networks describing
protein-protein and protein-DNA interactions among genes or
gene products [28]. To produce a comprehensive view of a
given disease by diverse types of genome-wide data, similarity
network fusion (SNF) has been inspired from the theoretical
multi-view learning framework to construct the networks of
samples (e.g., patients) for each data type and fuse them into
one network, which can represent the sample patterns underly-
ing data [102]. Recently, a new framework called “pattern
fusion analysis” (PFA) has been proposed to perform auto-
mated information alignment and bias correction and to fuse
local sample patterns (e.g., from each data type) into a global
sample pattern corresponding to phenotypes (e.g., across most
data types). Particular, PFA can identify common and comple-
mentary sample patterns from different omics profiles by opti-
mally adjusting the effects of each data type based on the local
tangent space alignment (LTSA) theory [103].

3.Matrix-based integration model. Previously, the integrative
scheme of ping-pong algorithm was proposed to integrate
more than one type of data from the same biological samples,
which is dependent on the usage of co-modules describing
coherent patterns across paired datasets [29]. Actually, these
methods can be included into several classes according to the
type of applied matrix decomposition: one is a joint (nonnega-
tive) matrix factorization technique that projects multiple types
of genomic data onto a common coordinate system, in which
heterogeneous variables weighted highly in the same projected
direction form a multidimensional module (md-module) [21];
two is higher-order generalized singular value decomposition
(GSVD), which is designed for efficient, parameter-free and
reproducible identification of network modules simultaneously
across multiple conditions [104, 105]; and three is rank matrix
factorization as multi-view bi-clustering to model subtyping
and recognize subtype-specific features simultaneously, e.g.,
integrate mutational and expression data while taking into
account the clonal properties of carcinogenesis [30].


Integrative Analysis of Omics Big Data 123
Free download pdf