Previously, machine learning has been applied in many
biological studies, such as (1) sequence analysis [6] to select ortho-
log genes, to identify binding motifs, or to predict the functional
domains; (2) image analysis [7] to select image indicators, to iden-
tify object (e.g., cells) boundary in images, or to judge the object
(e.g., molecules) type in images; (3) interaction analysis [8]to
extract functional characteristics, to recognize functional modules,
or to predict functional associations; (4) disease analysis [9]to
select disease-associated genes, to identify disease subtypes, or to
diagnose and know the prognoses of patients; and (5) annotation
analysis [10] to select keywords in medical text, to recognize
biological terms in literature, or to predict the person states by
question survey.
Recently, along with the development of high-throughput
technologies [11, 12], many novel machine learning technologies
have been implemented to handle with these new big data [13],
such as sequence assembly, modification pattern identification,
confounding factor removal, heterogeneous data integration, and
so on. This paper tries to provide wide cases to introduce the
selection of machine learning methods in different practice scenar-
ios involved in the whole biological and biomedical study cycle,
rather than technical discussion on methodologies. After the brief
introduction of several bioinformatic tools based on machine
learning technologies, this review first demonstrates the categories
of machine learning methods according to their biological applica-
tion scenarios; next, developed on big biological data, machine
learning strategies for analyzing omics data are discussed; and at
last, the potential challenges on machine learning in the cutting-
edge biological studies are deeply investigated and noted.
2 Materials
Generally, machine learning techniques aim to develop novel algo-
rithms to improve the computers to assist human beings in the
analysis of large, complex datasets [14]. For the big biomedical
data, machine learning has entered into new and wide application
fields [11, 15]. Many recent literature reviews have summarized the
general and discriminative modeling approaches by applications of
supervised, semi-supervised, and unsupervised machine learning
methods [14], such as:
A survey on the machine learning applications for the annotation of
sequence elements and epigenetic, proteomic, or metabolomic
data [14].
A comprehensive review on the omics and clinical data integration
techniques from a machine learning perspective [16].
184 Xiang-tian Yu et al.