predictions and measurements create data could benefit from data assimilation
techniques, and so far, that potential has been little explored. Each context will
raise its own mathematical demands.
Data assimilation has hardly been used at all in the context of climate
models, and some researchers believe that it has great promise. One of the
critical problems in climate modeling, which statistical methods, whether data
assimilation or other methods, are needed to address, is assessing how certain
the models’ predictions are. Currently, the uncertainty is estimated in a very ad
hoc way: Different modeling centers build different models that take somewhat
different approaches, and the spread of the predictions of the models is
presumed to give a reasonable sense of the degree of certainty. If they vary in
their prediction of average global temperature in 2100 by, say, 5 degrees
Celsius, it is imagined that the best prediction lies somewhere within that range,
and that the 5 degree spread roughly describes the spread of temperature that it
might end up being. But while the models certainly do help us understand how
climate is likely to behave, there’s little reason to believe that the spread between
the models faithfully represents the range of possibilities. An alternate approach
would be to assess the uncertainty in each piece of the model separately along
with the uncertainty in the data itself. Statistical methods could then combine
uncertainty estimates from each model piece and from the data and provide an
objective, unbiased assessment of uncertainty overall. But this approach has yet
to be developed.
Sustainability issues raise all kinds of data-modeling issues like this, more
than most areas of science. Because sustainability problems deal with complex
natural systems, understanding them requires lots of data, and the data are
never as tidy, reliable, consistent, or meaningful as is needed. So when scientists
march out and install thermometers, count tree species, drill ice cores, and tally
malaria cases, filling their hard drives with millions and billions and trillions of
data points, they usually find that they don’t have exactly the information that
they need when they bring those hard drives back to the lab. They then turn to
statisticians or other mathematical scientists and ask them how to manipulate the
data into the necessary from – but often, the mathematical tools needed for the
job haven’t yet been invented.
Part of the problem is that sustainability issues often require merging
datasets produced at different times for different purposes. In the U.S., for