5 Conclusions
System’s parameter estimation in biology asks for a continuous
feedback between biological and procedural information, the data
analysis by no way can be considered as a “separately optimized” set
of procedures to be applied to a set of experimental results. The
focus must be on the underlying (and largely unknown) network
linking the different players (in our example different gene expres-
sions) of the system at hand. This network, as such, is the only
relevant “causative agent” with the experimental observables acting
as probes of the coordinated motion of the underlying network.
This peculiar situation (Warren Weaver in a famous 1948 paper
[15] named “organized complexity”) asks for a completely differ-
ent style of reasoning with respect to the classical approach of
biologists used to a neat dependent/independent variables discrim-
ination and considering the observables as autonomous players in
the game.
Complexity can be a blessing and not a curse if we learn how to
manage it resisting to the temptation of the direct consideration of
“all the agents involved” in model construction.
The most fruitful way is letting the network to suggest us (e.g.,
by the application of unsupervised techniques like PCA) where to
look avoiding the overfitting/irrelevance traps.
References
- Transtrum MK et al (2015) Perspective: slop-
piness and emergent theories in physics, biol-
ogy and beyond. J Chem Phys 143:01091 - Kullback S, Leibler RA (1951) On information
and sufficiency. Ann Math Stat 22(1):79–86 - Srivastava N et al (2014) Dropout: a simple
way to prevent neural networks from overfit-
ting. J Mach Learn Res 15(1):1929–1958 - Tropsha A (2010) Best practices for QSAR
model development, validation, and exploita-
tion. Mol Inform 29(6–7):476–488 - Pearson K (1901) On lines and planes of clos-
est fit to systems of points in space. Lond Edinb
Dubl Phil Mag J Sci 2(11):559–572 - Giuliani A (2017) The application of principal
component analysis to drug discovery and bio-
medical data. Drug Discov Today 22
(7):1069–1076 - Soofi E (1994) Capturing the intangible con-
cept of information. J Am Stat Assoc 89
(428):1243–1254 - Pascual M, Levin SA (1999) From individuals
to population densities: searching for the inter-
mediate scale of nontrivial determinism. Ecol-
ogy 80(7):2225–2236
9. Broomhead DS, King GP (1986) Extracting
qualitative dynamics from experimental data.
Physica D 20(2–3):217–236 - Benigni R, Giuliani A (1994) Quantitative
modeling and biology: the multivariate
approach. Am J Phys Regul Integr Comp
Phys 266(5):R1697–R1704 - Marwan N et al (2007) Recurrence plots for
the analysis of complex systems. Phys Rep 438
(5):237–329 - Kruskal JB (1964) Multidimensional scaling by
optimizing goodness of fit to a nonmetric
hypothesis. Psychometrika 29(1):1–27 - Anderberg MR (2014) Cluster analysis for
applications: probability and mathematical sta-
tistics: a series of monographs and textbooks,
vol 19. Academic, Cambridge - Simonelli V et al (2016) Crosstalk between
mismatch repair and base excision repair in
human gastric cancer. Oncotarget 5. 10.
18632/oncotarget.10185 - Weaver W (1948) Science and complexity. Am
Sci 36:536–549
68 Alessandro Giuliani