Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1
technical errors) in metabolomics study. Currently, numerous
methods are applied in the normalization of metabolomics
[113–115], and they are generally categorized into two groups:
(1) method-driven normalization approaches that remove the
unwanted variations case by case based on the internal standards
or QC samples and (2) data-driven normalization approaches that
normalize data based on biases among various metabolites [116].

5.4 Statistical
Analysis


In metabolomics study, huge variables and high-dimension datasets
are universal, which raise great challenge in statistics. Chemometric
analysis has become an integral part of the metabolomic workflow
because it provides interpretable models for complex intercorre-
lated data. Recently, MetaboAnalyst and XCMS online are devel-
oped as chemometric tools for data analysis, which are widely
adopted by the metabolomics research community.
Prior to multivariate statistical analysis, mean centering and
variance scaling are often conducted. The aim of mean centering
is to make data centralization in multidimensional space implemen-
ted by subtracting the mean value to each variable. The variance
scaling tends to normalize variable value through dividing each
variable by the standard deviation of all variables. The mean center-
ing and variance scaling are combined to form so-called auto scal-
ing. Pareto scaling is an alternative popular scaling method, which
is performed through dividing each variable by the square root of
standard deviation of all variables; thus it minimizes the effects of
undesired noises. In practice, auto-scaling is appropriate for the
data scaling of GC-MS metabolomics, while Pareto scaling is more
popular in the data scaling of LC-MS metabolomics.
When raw data is transformed into expression matrix table,
statistical analysis is performed to characterize the correlation
between groups (control group and case group) and identify can-
didate biomarkers for group classification. In metabolomics study,
the number of variables is large and the sample size is small; there-
fore variable selection or reduction is a crucial step in statistical
analysis. After variable selection, multivariable analysis is conducted
to extract metabolites with different abundance between groups.
Commonly, the multivariable analysis methods are divided into two
categories, e.g., non-supervised approach and supervised approach.
In the non-supervised approach, the data matrix of metabolite is
divided into several parts based on inherent characterization of
data, in which the relationship between parts can be understood
by comparing different parts. Principal component analysis (PCA)
[117] is a popular non-supervised approach, which attempts to
uncover hidden internal structures through constructing compo-
nents to distinguish group samples with maximum likelihood.
Hierarchical cluster analysis (HCA) [118] is another
non-supervised method used in metabolomics study commonly.
HCA can cluster the metabolite data to divide the instance space.

282 Jing Cheng et al.

Free download pdf