Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1
of microscopy images [119]. To overcome the pitfalls associated
with conventional machine learning classifiers, a deep convolutional
neural network (DeepLoc) is improving to analyze yeast cell images
for automated classification of protein subcellular localization
[120]. And a deep neural network is also applied to prospectively
predict lineage choice in differentiating primary hematopoietic
progenitors before conventional molecular markers are observable,
by using image patches from bright-field microscopy and cellular
movement [121]. Especially, a deep learning algorithm has been
primarily used surface area information from magnetic resonance
imaging of the brain of young individuals to efficiently predict the
diagnosis of autism in old individual high-risk children [122]; and a
single CNN, trained end to end from images directly, using only
pixels and disease labels as inputs, can classify skin cancer with a
level of competence comparable to dermatologists [123].
Although the deep learning has shown satisfied potential for
analyzing omics data [124], the characteristic of biological high-
throughput data as “small-sample high-dimension” is still a big
challenge (seeNote 3), and the “black box” of deep learning or
other machine learning methods has usually hidden many useful
readable information for biological or biomedical researches. Thus,
it would be important to use multiple data resources to consistently
improve collective health [125] in a discriminative and interpreta-
tive manner.

4 Notes


Totally, the machine learning plays an important role in current
biological and biomedical researches. Especially, these computer-
advanced technologies will be efficient to analyze the big biological
data. However, different from conventional big social data, the big
omics data are always “small-sample-high-dimension”, which cause
overwhelming application problems and also introduce new
challenges.


  1. The sample unbalance problem is usually discussed in the
    modeling of machine learning; some available solutions are
    resampling, one-class model or anomaly detection. But, in big
    biological data, the “extremely unbalance” problem exists,
    such as rare mutations or rare diseases, which is hard to obtain
    enough positive samples.Thus, the prior-knowledge integrated
    methods are required to provide transferable learning methods to
    borrow (combine) multiple sources of data to assist the solution of
    single-sample analysis.

  2. A large number of machine learning models are “black box,”
    which is enough to apply in social applications. However, in
    biological fields, the molecular mechanism underlying any


Revisit of Machine Learning Supported Biological and Biomedical Studies 197
Free download pdf