Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1
two-way identification of most possible TF-gene interactions: on
the basis of ENCODE ChIP-Seq binding evidence or Jaspar pre-
diction and co-expression according to the data of the largest
cancer omics resource [112].

4 Notes


This paper has given a comprehensive summary of data resources,
data analysis, and data visualization supporting the integration of
big biological data. Finally, we would like to list several notes on this
review:


  1. Conventional big data from society would have a large number
    of samples, and each sample has a few features/attributes. By
    contrast, the big biological data would supply not large but
    enough samples and test tens of thousands of features for each
    sample simultaneously. This small-sample high-dimensional
    data requires new analytic approaches, including the data
    integration.

  2. “Bottom-up integration” mode with follow-up manual inte-
    gration is always the hypothesis-driven approaches to extract
    the significant enriched or observed biological knowledge in
    data. The key of these methods is there should be clear and
    suitable biological hints on the experiments and outcome data,
    and then the data combination can extract the biological signals
    in each type of data and explain the same preset biological
    hypothesis in a single analysis framework. Although for differ-
    ent combinations on data types, there is already corresponding
    integrative analysis framework, it is still short of more general
    and flexible scheme to deal with the existing data types and
    potential new data types.It is urgently required to design quan-
    titative evaluation on the confidence of driver hypothesis ahead of
    data analysis and also on the contribution of different data types
    to the biological hypothesis.

  3. Meanwhile, “top-down integration” mode with follow-up in
    silico integration is usually the data-driven approaches to
    extract the most probable feature signals or sample patterns in
    data. The key of these methods is there must be efficient
    correction to reduce the noise and bias in different types of
    data, and then the data fusion can identify the coordinate data
    distribution or data correlation in multiple types of data in a
    unified mathematical model. Many techniques are available;
    however, they are used solid constraint on the union of data
    coordination, which limit their application on the diverse
    biological systems. Thus, the more relaxations, e.g., soft-
    constraint-based approaches, will expand the power of data fusion
    in biological study and detect unseen biological patterns.


126 Xiang-Tian Yu and Tao Zeng

Free download pdf