Computational Systems Biology Methods and Protocols.7z

(nextflipdebug5) #1
concentration, temperature, and excessive dilution. In general,
spectra are initially aligned to an internal standard to remove the
global shifts. But this is not enough because there are still shifts of
peaks in one spectrum, so a refine or local alignment should be
conducted for peaks in one spectrum. For this purpose, various
alignment algorithms, including interval correlated shifting (ico-
shift) [91], correlation optimized warping (COW) [92], fuzzy
warping, and hierarchical cluster-based peak alignment (CluPA)
[93], are proposed. The NMR spectra are divided into segments
(so-called bins or buckets), and the spectral intensity within each
bin is calculated. Binning attempts to minimize effects from varia-
tions in peak positions caused by the pH, ionic strength, and
composition in a sample. Binning can reduce the dimensionality
for multivariate statistical analysis but decreases the spectral resolu-
tion. The most common type of spectral binning is equidistant
binning with 0.04 ppm. This indicates that every spectrum is
divided into evenly spaced integral regions with spectral width of
0.04 ppm. Sometimes a peak is split between two bins, for equal
size binning is the lack of flexibility of the boundaries. To prevent
peak splitting, some new binning methods are proposed
[94–96]. As for peak normalization, constant sum normalization
(CSN) is the most commonly used method, in which the total
spectral peak area is considered as a constant across. In addition,
histogram matching (HM) normalization, volume normalization,
and weight normalization (dry weight/wet weight) are also used in
metabolomics studies [55]. After normalization, the scaling step is
conducted to avoid the situation where the highest peaks are the
most influential in the multivariate data models. Currently, several
scaling approaches, such as auto scaling, Pareto scaling, mean cen-
tering, and range scaling, are proposed. Among these approaches,
the Pareto scaling is the most commonly used approaches, where
each variable is divided by the square root of the standard deviation
of the column values.
Although a number of metabolites are detected by the 1D^1 H
NMR methods, the identification of metabolites in biological sam-
ples is still a huge challenge because many peaks tend to overlap in
1D^1 H NMR spectra. This problem can be solved by the 2D^1 H-^1 H
TOCSY and 2D^1 H-^13 C HSQC approaches. After spectral data
preprocessing, the known metabolites is characterized by querying
libraries of NMR spectra (1D and 2D). Now, many NMR spectra
databases have been constructed, including the Human Metabo-
lome Database (HMDB) [97, 98], the Biological Magnetic Reso-
nance Data Bank (BMRB) [99], the Madison-Qingdao
Metabolomics Consortium Database (MQMCD) [100], and the
MetaboLights database [101]. Various forms of queries can be
performed to extract metabolite information from these databases.
Users can submit structure, chemical shift, orJ-coupling constant
of metabolites to these databases for query, and relevant biological

280 Jing Cheng et al.

Free download pdf