3.3.3 NMR Data
Postprocessing
NMR data postprocessing is a necessary step of metabolomics
pipeline to extract useful information related to the state of
biological system. This step helps to avoid sources of variation in
the data, such as dilution effect, subtle changes in chemical shifts,
line-widths, and baseline across series of spectra, which can interfere
with the outcome of the statistical analysis, leading to false
deductions.
NMR data postprocessing usually includes exclusion of
non-informative regions, binning, normalization, scaling, and
data export for subsequent multivariate statistical analysis.
- Remove theregions in thespectra thatcontain only noise and/or
exogenous peaks. Therefore, exclude the spectral regions outside
the window 0.5 (including TSP signal) and 9.0 ppm and those
containing the residual water (δ4.7–5.0 ppm) and drug peaks. - Reduce the dimensionality of data splitting the p-JRES spectra
into small segments (bins or buckets) with variable widths
ranging from 0.01 to 0.04 ppm to ensure that each bin con-
tains the same signals throughout all the spectra. If local peak
shifts across series of spectra are still observed, compress groups
of bins into single bins or alignment of the spectra. Then,
integrate the signal within each bin (seeNote 5). - Normalize the binned spectra by applying the Probabilistic
Quotient Normalization (PQN) [10, 11] method to make
spectra comparable:
(a) Set the total spectral area of every spectrum to 100.
(b) Calculate as a reference spectrum the median spectrum
(median of each variable/bin area) of healthy group
samples.
(c) Calculate the quotient between the area of each spectral
bin of the considered spectrum and that of the
corresponding bin in the reference spectrum.
(d) Calculate the median of all the quotients.
(e) Divide all the variables of the considered spectrum by the
median quotient.
(f) Repeatsteps c–efor all spectra. - Scaling the data by applying the generalized log (g-log) trans-
formation [12, 13] to make the variables within spectra
comparable:
(a) Estimate the g-log transformation parameter (λ) by the
maximum likelihood method using a set of five replicate
measurements.
(b) Obtain these five replicates from a single homogeneous
pool of fecal water samples from healthy and pathological
patients. Process the replicate spectra as described above
Metabolomics and Clinical Needs 331