study is necessary to deepen our understanding, which means
the “white-box” models are expected. Although, some
approaches have tried “gray-box” strategies to provide more
biological details from the mathematical model, it is still urgent
to develop new network-based theories and methods to balance the
trade-off between accuracy and interpretability of machine
learning in biological domains.
- The association is a “white clue” captured in conventional
machine learning study and application, which is also a target
of the conventional big data study. But, “causality” rather than
association would be the key helpful for biologist, which can be
used to determine the experimental target even the future
research.Therefore, how to obtain the causal relationship in
biological community from the big biological data is asking for
the new development of causality inference on small-sample high-
dimension data.
References
- Ma S, Huang J (2008) Penalized feature
selection and classification in bioinformatics.
Brief Bioinform 9(5):392–403.https://doi.
org/10.1093/bib/bbn027.bbn027[pii] - Rodriguez A, Laio A (2014) Machine
learning. Clustering by fast search and find
of density peaks. Science 344
(6191):1492–1496. https://doi.org/10.
1126/science.1242072 - Zeng T, Zhang W, Yu X, Liu X, Li M, Liu R,
Chen L (2014) Edge biomarkers for classifi-
cation and prediction of phenotypes. Sci
China Life Sci 57(11):1103–1114.https://
doi.org/10.1007/s11427-014-4757-4 - Ghahramani Z (2015) Probabilistic machine
learning and artificial intelligence. Nature 521
(7553):452–459.https://doi.org/10.1038/
nature14541 - Jordan MI, Mitchell TM (2015) Machine
learning: trends, perspectives, and prospects.
Science 349(6245):255–260. https://doi.
org/10.1126/science.aaa8415 - Lee EY, Fulan BM, Wong GC, Ferguson AL
(2016) Mapping membrane activity in undis-
covered peptide sequence space using
machine learning. Proc Natl Acad Sci U S A
113(48):13588–13593.https://doi.org/10.
1073/pnas.1609893113 - Jones TR, Carpenter AE, Lamprecht MR,
Moffat J, Silver SJ, Grenier JK, Castoreno
AB, Eggert US, Root DE, Golland P, Sabatini
DM (2009) Scoring diverse cellular morphol-
ogies in image-based screens with iterative
feedback and machine learning. Proc Natl
Acad Sci U S A 106(6):1826–1831.https://
doi.org/10.1073/pnas.0808843106
- King RD, Muggleton S, Lewis RA, Sternberg
MJ (1992) Drug design by machine learning:
the use of inductive logic programming to
model the structure-activity relationships of
trimethoprim analogues binding to dihydro-
folate reductase. Proc Natl Acad Sci U S A 89
(23):11322–11326 - Shipp MA, Ross KN, Tamayo P, Weng AP,
Kutok JL, Aguiar RC, Gaasenbeek M,
Angelo M, Reich M, Pinkus GS, Ray TS,
Koval MA, Last KW, Norton A, Lister TA,
Mesirov J, Neuberg DS, Lander ES, Aster
JC, Golub TR (2002) Diffuse large B-cell
lymphoma outcome prediction by gene-
expression profiling and supervised machine
learning. Nat Med 8(1):68–74.https://doi.
org/10.1038/nm0102-68 - Szarvas G, Farkas R, Busa-Fekete R (2007)
State-of-the-art anonymization of medical
records using an iterative machine learning
framework. J Am Med Inform Assoc 14
(5):574–580. https://doi.org/10.1197/j.
jamia.M2441 - Obermeyer Z, Emanuel EJ (2016) Predicting
the future - big data, machine learning, and
clinical medicine. N Engl J Med 375
(13):1216–1219.https://doi.org/10.1056/
NEJMp1606181 - Passos IC, Mwangi B, Kapczinski F (2016)
Big data analytics and machine learning:
198 Xiang-tian Yu et al.