Computational Systems Biology Methods and Protocols.7z

technical errors) in metabolomics study. Currently, numerous methods are applied in the normalization of metabolomics [113–115], and they are generally categorized into two groups: (1) method-driven normalization approaches that remove the unwanted variations case by case based on the internal standards or QC samples and (2) data-driven normalization approaches that normalize data based on biases among various metabolites [116].

5.4 Statistical
Analysis

In metabolomics study, huge variables and high-dimension datasets are universal, which raise great challenge in statistics. Chemometric analysis has become an integral part of the metabolomic workflow because it provides interpretable models for complex intercorre- lated data. Recently, MetaboAnalyst and XCMS online are devel- oped as chemometric tools for data analysis, which are widely adopted by the metabolomics research community. Prior to multivariate statistical analysis, mean centering and variance scaling are often conducted. The aim of mean centering is to make data centralization in multidimensional space implemen- ted by subtracting the mean value to each variable. The variance scaling tends to normalize variable value through dividing each variable by the standard deviation of all variables. The mean centering and variance scaling are combined to form so-called auto scaling. Pareto scaling is an alternative popular scaling method, which is performed through dividing each variable by the square root of standard deviation of all variables; thus it minimizes the effects of undesired noises. In practice, auto-scaling is appropriate for the data scaling of GC-MS metabolomics, while Pareto scaling is more popular in the data scaling of LC-MS metabolomics. When raw data is transformed into expression matrix table, statistical analysis is performed to characterize the correlation between groups (control group and case group) and identify can- didate biomarkers for group classification. In metabolomics study, the number of variables is large and the sample size is small; there- fore variable selection or reduction is a crucial step in statistical analysis. After variable selection, multivariable analysis is conducted to extract metabolites with different abundance between groups. Commonly, the multivariable analysis methods are divided into two categories, e.g., non-supervised approach and supervised approach. In the non-supervised approach, the data matrix of metabolite is divided into several parts based on inherent characterization of data, in which the relationship between parts can be understood by comparing different parts. Principal component analysis (PCA) [117] is a popular non-supervised approach, which attempts to uncover hidden internal structures through constructing compo- nents to distinguish group samples with maximum likelihood. Hierarchical cluster analysis (HCA) [118] is another non-supervised method used in metabolomics study commonly. HCA can cluster the metabolite data to divide the instance space.

282 Jing Cheng et al.

Computational Systems Biology Methods and Protocols.7z

Get our desktop app

Company

Features

Documentation

Resources