Thus, we contemporarily measure the gene expression level of
24 enzymes involved in the process selected in order to get a good
coverage of the main DNA repair pathways [14] in primary cell
cultures relative to 35 patients of gastric cancer in order to get a
quantitative experimental “summary” of the above complex
network.
The statistical units of the raw data matrix are the patients and
the 24 DNA repair enzymes are the variables, whose intermingled
network of interaction determines the experimentally observed
correlation structure among the expression values of the genes
coding for the selected enzymes.
The idea at the basis of PCA is that each single observable
(enzyme expression in different patient cell lines in our case) derives
its particular value from a combination of hidden independent
factors impinging on it. The hidden factors are “the real things,”
the observables are the probes of such factors. This is the same case
of a chemical mixture, whose observed spectrum comes from the
combination of a set of elementary spectra relative to the molecules
composing the mixture, the different spectral peaks are the
“probes,” while the molecules are the “real things.” The molecules
in the mixture are the components and their relative concentration
corresponds to the percentage of variance explained by each com-
ponent. This is more than analogy: spectroscopic apparatuses used
in analytical chemistry have embedded a PCA procedure to decon-
volve the obtained spectra.
The observed values of different enzymes correspond to a
weighted summation over the contribution coming from their
participation to the different pathways (components). As stated in
the previous paragraph, this implies a total revolution of the way we
look at nature: enzyme expression values are a consequence (and
not a cause) of the processes (pathways) going on in the cells.
PCA applied to the data set gave rise to a “bona fide” three-
component solution accounting for the “signal” part of informa-
tion; in Table1 the distribution of percentage of variation explained
across the components is reported.
It is worth noting how (analogously to Fig.1) the PCA mod-
eling of the data encompasses three “top” eigenvalues and a long
tail of minor components accounting for the “uncorrelated”
(noisy) part of information. The three-component solution
explains the 65% of total information; the components are extracted
in order of variance explained: PC1 accounts for 41% of total
variance, PC2 for 13%, PC3 for 10%. That is to say, there are
three main order parameters (correlated flux of variations) organiz-
ing the mutual correlations between gene expressions, now we
must “give a name” to these pathways. In the case of chemical
mixtures, this process is automatic and derives from the knowledge
of the typical spectrum of each molecular species, in our case we
must rely on the loadings of each enzyme on the extracted
Parameters Search 63