Systems Biology (Methods in Molecular Biology)

(Tina Sui) #1

3.3.3 NMR Data
Postprocessing


NMR data postprocessing is a necessary step of metabolomics
pipeline to extract useful information related to the state of
biological system. This step helps to avoid sources of variation in
the data, such as dilution effect, subtle changes in chemical shifts,
line-widths, and baseline across series of spectra, which can interfere
with the outcome of the statistical analysis, leading to false
deductions.
NMR data postprocessing usually includes exclusion of
non-informative regions, binning, normalization, scaling, and
data export for subsequent multivariate statistical analysis.


  1. Remove theregions in thespectra thatcontain only noise and/or
    exogenous peaks. Therefore, exclude the spectral regions outside
    the window 0.5 (including TSP signal) and 9.0 ppm and those
    containing the residual water (δ4.7–5.0 ppm) and drug peaks.

  2. Reduce the dimensionality of data splitting the p-JRES spectra
    into small segments (bins or buckets) with variable widths
    ranging from 0.01 to 0.04 ppm to ensure that each bin con-
    tains the same signals throughout all the spectra. If local peak
    shifts across series of spectra are still observed, compress groups
    of bins into single bins or alignment of the spectra. Then,
    integrate the signal within each bin (seeNote 5).

  3. Normalize the binned spectra by applying the Probabilistic
    Quotient Normalization (PQN) [10, 11] method to make
    spectra comparable:
    (a) Set the total spectral area of every spectrum to 100.
    (b) Calculate as a reference spectrum the median spectrum
    (median of each variable/bin area) of healthy group
    samples.
    (c) Calculate the quotient between the area of each spectral
    bin of the considered spectrum and that of the
    corresponding bin in the reference spectrum.
    (d) Calculate the median of all the quotients.
    (e) Divide all the variables of the considered spectrum by the
    median quotient.
    (f) Repeatsteps c–efor all spectra.

  4. Scaling the data by applying the generalized log (g-log) trans-
    formation [12, 13] to make the variables within spectra
    comparable:
    (a) Estimate the g-log transformation parameter (λ) by the
    maximum likelihood method using a set of five replicate
    measurements.
    (b) Obtain these five replicates from a single homogeneous
    pool of fecal water samples from healthy and pathological
    patients. Process the replicate spectra as described above


Metabolomics and Clinical Needs 331
Free download pdf