Systems Biology (Methods in Molecular Biology)

(Tina Sui) #1
(i.e., selection of exclusion regions, binning, and
normalization).


  1. Mean centered the data: subtract the mean value of each vari-
    able from the original data of that bin.


3.4 Statistical
Analysis of NMR Data



  1. Reduce the data by using Principal Component Analysis. This
    process assigns to each sample a score relative to each extracted
    component (Principal Component, PC). The extracted com-
    ponents are independent of each other by construction, thus
    they are non-overlapping features of the studied system. Use
    the component scores to plot PC maps of the samples which
    best provide an indication of the differences between the classes
    (healthy or disease groups) in terms of metabolic similarity.

  2. Carry out separate inferential statistics (t-test) on the different
    component scores, so as to check for the statistical significance
    of the between groups differences.

  3. Compare the metabolic profiles and the clinical features of each
    patient by Pearson’s correlation and/or ANOVA test, having as
    dependent variables the components and as regressors (sources
    of variation) potential modulating or confounding factors.

  4. After having verified the absence of potentially confounding
    factors on the PCs, apply a linear discriminant analysis (LDA)
    to the components so as to develop a predictive model for the
    classification of patients in healthy or disease groups.

  5. In the case of statistically significant effects of confounding
    factors on discriminant components, correct (covariance analy-
    sis or partial correlation analysis) for the effect of the above-
    mentioned factors. This procedure will allow estimating the
    actual degree of association between DA-based membership
    class probability and clinical status.


We investigated the NMR data by using Principal Component
Analysis (PCA) carried out on samples from young patients with
cystic fibrosis (CF) and healthy children. Five components are
sufficient to explain the 40% of the variance in the metabolic data.
The score plot in Fig.1-NMR shows a clear separation between the
CF and healthy children on the PC1 (p¼0.001 byt-test) and PC4
(p<0.0001 byt-test).
In this study, since the metabolic status of the CF patients could
be influenced by several variables such as age, gender, and antibiotic
and probiotic assumption, we decided to assess whether any of
these factors could influence the separation between CF patients
and healthy children. To address age and gender as potential con-
founding factors, the metabolic profiles and the clinical features of
each child were compared by Pearson’s correlation, while for asses-
sing antibiotic and probiotic assumption variables, the metabolic

332 Luca Casadei et al.

Free download pdf