pathway metabolite involved are retrieved as a result. The features
of different databases are summarized by Ellinger et al. [102]. In
addition, a new database named^13 C-TOCCATA customized data-
base is recently developed [103], which contains complete^1 H and
(^13) C chemical shift information on individual spin systems and
isomeric states of common metabolites. Unknown metabolites
need to be separated temporarily from target metabolites, which
can simplify the NMR spectra to a great extent. Then, the target
metabolite can be determined combined 2D NMR (TOCSY,
COSY, HMBC, HSQC) with MS according routine procedure
[104]. The functional groups can be identified conclusively by
their^1 H and^13 C chemical shifts. The skeletal connectivity is
deduced by combining data from both homonuclear (COSY,
TOCSY) and heteronuclear (HSQC and HMBC) correlation spec-
troscopy. The relative stereochemistry is inferred from NOE corre-
lations between protons and coupling constants (nJH,H,nJC,H).
Finally, the proposed structure is verified in light of all of the
observed data [104]. With new NMR pulse sequence emerging,
unknown metabolites can be identified in the complicated mixtures
[105]. The methods are demonstrated for covariance^1 H-^1 H
TOCSY and^13 C-^1 H HSQC-TOCSY spectra and triple-rank corre-
lation spectra constructed from pairs of^13 C-^1 H HSQC and^13 C-^1 H
HSQC-TOCSY spectra [105].
5.3 Signal Correction
and Data
Normalization
In metabolomics, samples of different batches span several months
or even years commonly. The signal drift of metabolites over time
and across batches is a major confounding factor in large-scale
metabolomics studies. The unexpected variations of metabolite
ion peaks in the measurements during data acquisition (intra- and
inter-batch) are unavoidable, probably derived from sample
handling and preparation, the degradation of column efficiency,
MS instrument contamination, and nonlinear drift over long runs
[106–108]. To correct signal drift, the quality control (QC) of
samples are necessary over the entire time course of large-scale
study. For this purpose, QC samples are needed, which have similar
matrix composition to biological samples under study. Applying
QC sample can help correct signal drift and remove peak with
extreme change so as to eliminate batch effects
[109–111]. Recently, the popular signal correction methods
include the robust locally estimated scatter-plot smoothing
(QC-RLSC) [39] and support vector regression (SVR)
[112]. After signal correction, peak change should be checked,
where peaks with variation over a threshold (e.g., 20% and 30%
for UPLC-MS and GC-MS, respectively) are removed to make
subsequent statistical analysis can be conducted correctly.
After signal correction, data normalization is an essential step
before statistical analysis to remove unwanted variations (e.g.,
unwanted biological variations, experimental variations, and
Metabolomics: A High-Throughput Platform for Metabolite Profile Exploration 281