components. This interpretation stems from the previous knowl-
edge of the phenomenon at hand, and thus naturally involves
“subjective judgment.” This judgment builds upon the loading
matrix (Table2) reporting the Pearson correlation coefficients
between original variables (probes, gene expression values) and
extracted components. In Table2, the most relevant loading for
each component is bolded.
Looking at the loading pattern, we immediately discover all the
enzymes have positive and, with only a few exceptions, very high
correlations with PC1. This implies PC1 is a sort of “global repair
activation”: the cell lines with higher PC1 scores are the ones with
higher repair enzyme expression, independently of the particular
mechanism involved. This is an a posteriori result, we did not
impose the existence of such “global repair activity,” and it sponta-
neously emerges from experimental data. This corresponds to the
fact that the mutant agent (MMS) activates the DNA repair
machinery as a whole, this activation ends up into an increase in
expression levels of all the genes codifying for DNA repair enzymes.
Given the different patients have different levels of activation of the
entire DNA repair machinery; the “global activation” corresponds
to the most relevant factor in terms of variance explained.
In statistical jargon, a component like PC1 with all loadings of
the same sign is a “size” component. This name indicates different
PC1 scores correspond to “general changes” shared by all the
considered variables, it corresponds to what physicists call “mean
field.” Observing a size component as the most relevant one in
terms of variance explained is a signature of a very strongly
connected system behaving as an integrated whole, in physicsTable 1
The table reports the distribution of variance explained across the principal components. The
components are in decreasing order of relevance (Eigenvalue) that in turn is normalized in terms of
proportion of explained variance (Proportion). The difference between the variance explained by
subsequent components is reported in the field “Difference” while “Cumulative” corresponds to the
variance explained by the cumulative solutions at increasing dimensionality (number of considered
components). Bolded values mark the accepted global solution (bona fide signal)
Component Eigenvalue %Explained variance %Cumulative
1 9.9 41 41
2 3.3 13 54
3 2.4 10 65
4 1.4 6 71
5 1.3 6 77
6 1.0 4 81
7 0.9 3 8464 Alessandro Giuliani
