Computational Systems Biology Methods and Protocols.7z

learning approaches is used to identify the most confident candi- date disease genes by integrating prior knowledge into the differ- ential gene expressions between healthy and disease individuals [97].

3.2.2 Association
Community in Gut
Metagenomic Research

Almost all species in a microbial community could not be isolated and cultivated; the metagenomic methods have become one of the most important methods to analyze microbial community as a whole [44, 98]. With the fast accumulation of metagenomic samples and the advance of next-generation sequencing techniques [99], it is now possible to qualitatively and quantitatively assess all taxa (features) in a microbial community [99]. The number of gut microbiota is ten times more than the cells in the human body [100]. For understanding the interactions between human and human microbiome, three hypothesis are widely considered [101]: (1) the human genome may work as a part of larger sensorimotor organ, e.g., the human “meta- genome,” like our immune and nervous systems, responses to the environment change in real world; (2) the human body is an ecosystem composited with multiple ecological niches and habitats in which cellular species collaborate and compete; and (3) human beings are “super-organisms,” which incorporate multiple symbi- otic species into a massive individual. The complexity of human body and microbiome, especially gut microbiota, severally compli- cates the machine learning workflow. Microbiome study has been growing with the advancement of machine learning in the past years. Similar to microarray data analysis, the sample-by-taxon abundance matrix is the most com- monly used data structure in microbiome study. The machine learning methods are usually carried on such abundance data to determine which taxa differ between predefined groups of samples (e.g., diseased versus healthy) and build classification models by using these discriminatory taxa to predict the group of a new sample. The disadvantage at extracting salient features has limited the usage of traditional ecological assessment methods; thus, the classifying subject and body sites are the main goal for supervised classification. Usually, those available features include the taxon relative abundances, theαdiversity andβdiversity, and the general association between environment variable and operational taxon- omy units (OTUs). The supervised classification can provide similar inspiration for microbiome study as those applied at microarray data [102]. Although random forest method does not provide clear important ranks on features, it has been widely applied and evaluated in many microbiome-end tasks. In a study to compare 18 major classification methods for microbiome studies [103], as a strongest performer, RFs are suitable for moderately sized microbial communities. And in another comparison study of 21 machine

Revisit of Machine Learning Supported Biological and Biomedical Studies 195

Computational Systems Biology Methods and Protocols.7z

Get our desktop app

Company

Features

Documentation

Resources