Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1

Chapter 7


Integrative Analysis of Omics Big Data


Xiang-Tian Yu and Tao Zeng


Abstract


The diversity and huge omics data take biology and biomedicine research and application into a big data era,
just like that popular in human society a decade ago. They are opening a new challenge from horizontal data
ensemble (e.g., the similar types of data collected from different labs or companies) to vertical data
ensemble (e.g., the different types of data collected for a group of person with match information),
which requires the integrative analysis in biology and biomedicine and also asks for emergent development
of data integration to address the great changes from previous population-guided to newly individual-
guided investigations.
Data integration is an effective concept to solve the complex problem or understand the complicate
system. Several benchmark studies have revealed the heterogeneity and trade-off that existed in the analysis
of omics data. Integrative analysis can combine and investigate many datasets in a cost-effective reproduc-
ible way. Current integration approaches on biological data have two modes: one is “bottom-up integra-
tion” mode with follow-up manual integration, and the other one is “top-down integration” mode with
follow-up in silico integration.
This paper will firstly summarize the combinatory analysis approaches to give candidate protocol on
biological experiment design for effectively integrative study on genomics and then survey the data fusion
approaches to give helpful instruction on computational model development for biological significance
detection, which have also provided newly data resources and analysis tools to support the precision
medicine dependent on the big biomedical data. Finally, the problems and future directions are highlighted
for integrative analysis of omics big data.


Key wordsIntegration, Omics, High throughput, Big data, Complex diseases, Bayesian, Matrix
decomposition, Machine learning, Subtype, Precision medicine

1 Introduction


High-throughput screening is one of the primary technologies for
exploring complex intracellular dynamics in modern biology, and
the data produced by such approaches are usually called as omics
data [1]. The intuitive omics on genome appeared from the Human
Genome Project for obtaining the blueprint of complete human
genetic information; after which, the transcriptome and proteome
are also becoming available to measure the expression abundance of
mRNA and protein, respectively [2]. Lately, the epigenomics was

Tao Huang (ed.),Computational Systems Biology: Methods and Protocols, Methods in Molecular Biology, vol. 1754,
https://doi.org/10.1007/978-1-4939-7717-8_7,©Springer Science+Business Media, LLC, part of Springer Nature 2018


109
Free download pdf