Systems Biology (Methods in Molecular Biology)

(Tina Sui) #1

Chapter5


TheSearchforSystem’sParameters


Alessandro Giuliani


Abstract


The analysis of biological data asks for a delicate balance of content-specific and procedural knowledge; this
is why it is virtually impossible to apply standard mathematical and statistical recipes to systems biology.
The separation of the important part of information from singular (and largely irrelevant) details implies a
continuous interchange between biological and statistical knowledge. The generalization ability of the
models must be the principal focus of system’s parameter estimation, while the multi-scale character of
biological regulation orients the modeling style toward data-driven strategies based on the correlation
structure of the analyzed systems.


Key wordsPrincipal component analysis, Overfitting, Soft modeling, Biological regulation

1 Sloppiness and Overfitting: The Hidden Risks of Precision


In their brilliant paper [1], James Sethna and colleagues focus on
the separation between a “stiff” and a “sloppy” part present in any
modeling effort in science. The authors demonstrate this statement
building upon the eigenvalue distribution of the Fisher Informa-
tion Matrix [1, 2] Any model (from physics to biology) presents the
same distribution pattern. This pattern (Fig.1) shows a clear gap
between relatively coarse grain but effective models (top eigenva-
lues correspondent to a high impact of parameter modifications on
the predicted values, stiff part of the model) and a plethora of
largely irrelevant (sloppy) model parameter combinations. The
“sloppy” parameter combinations (lower eigenvalues), whose mod-
ifications have a scarce or null effect on the actual fitting, can
drastically reduce model generalization ability, so that the quest
for maximal fit has the drawback of the generation of complicated
models with no relevant increase in prediction power.
Figure1 can be interpreted (without appreciable loss of gener-
ality) as referred to models of the kind:Y¼f(x 1 ,x 2 ,x 3 ...xn) where
Y is the dependent variable we want to model in terms of
n independent variables (x 1 ...xn), the parameters are the

Mariano Bizzarri (ed.),Systems Biology, Methods in Molecular Biology, vol. 1702,
https://doi.org/10.1007/978-1-4939-7456-6_5,©Springer Science+Business Media LLC 2018


57
Free download pdf