learning approaches is used to identify the most confident candi-
date disease genes by integrating prior knowledge into the differ-
ential gene expressions between healthy and disease
individuals [97].
3.2.2 Association
Community in Gut
Metagenomic Research
Almost all species in a microbial community could not be isolated
and cultivated; the metagenomic methods have become one of the
most important methods to analyze microbial community as a
whole [44, 98]. With the fast accumulation of metagenomic sam-
ples and the advance of next-generation sequencing techniques
[99], it is now possible to qualitatively and quantitatively assess all
taxa (features) in a microbial community [99].
The number of gut microbiota is ten times more than the cells
in the human body [100]. For understanding the interactions
between human and human microbiome, three hypothesis are
widely considered [101]: (1) the human genome may work as a
part of larger sensorimotor organ, e.g., the human “meta-
genome,” like our immune and nervous systems, responses to the
environment change in real world; (2) the human body is an
ecosystem composited with multiple ecological niches and habitats
in which cellular species collaborate and compete; and (3) human
beings are “super-organisms,” which incorporate multiple symbi-
otic species into a massive individual. The complexity of human
body and microbiome, especially gut microbiota, severally compli-
cates the machine learning workflow.
Microbiome study has been growing with the advancement of
machine learning in the past years. Similar to microarray data
analysis, the sample-by-taxon abundance matrix is the most com-
monly used data structure in microbiome study. The machine
learning methods are usually carried on such abundance data to
determine which taxa differ between predefined groups of samples
(e.g., diseased versus healthy) and build classification models by
using these discriminatory taxa to predict the group of a new
sample. The disadvantage at extracting salient features has limited
the usage of traditional ecological assessment methods; thus, the
classifying subject and body sites are the main goal for supervised
classification. Usually, those available features include the taxon
relative abundances, theαdiversity andβdiversity, and the general
association between environment variable and operational taxon-
omy units (OTUs). The supervised classification can provide similar
inspiration for microbiome study as those applied at microarray
data [102]. Although random forest method does not provide
clear important ranks on features, it has been widely applied and
evaluated in many microbiome-end tasks. In a study to compare
18 major classification methods for microbiome studies [103], as a
strongest performer, RFs are suitable for moderately sized micro-
bial communities. And in another comparison study of 21 machine
Revisit of Machine Learning Supported Biological and Biomedical Studies 195