230 Recent Developments in Density Forecasting
likelihood as an out-of-sample measure of fit relies on the prior being informative
(see Eklund and Karlsson, 2007). When an uninformative prior is used, as is com-
mon (see Fernandez, Ley and Steel 2001), the marginal likelihood reduces to an
in-sample measure of fit. The relationship between in-sample (system) fit and the
expected forecasting performance of the variable of interest is also lost in multivari-
ate forecasting models, prompting Andersson and Karlsson (2007) to suggest use of
the predictive likelihood for forecast combination since the univariate predictive
density of interest can be readily simulated.
These out-of-sample measures of fit involve splitting the available sample (t=
1,...,T) in two and measuring the fit of the models according to how well recur-
sively computedh-step ahead forecasts perform, according to the marginal likeli-
hood or logarithmic score, over a hold-out (or predictive) period (t=t 0 ,...,T).
Importantly, this means the measure of fit varies according toh. Empirically, the
size of the hold-out period also matters. Theoretically, as the size of the hold-out
period increases we should expect the weightswiBMAbased on the predictive like-
lihood to select the correct model consistently (Eklund and Karlsson, 2007). But
there is a trade-off when selecting the size of the hold-out period. A small value
means the weights adapt quickly to change; but a larger value means the weights are
better estimated. Pesaran and Timmermann (2007) have established the optimal
trade-off between bias and forecast error variance in regression models subject to
one or more structural breaks. Similarly when density forecasting, since economic
time-series are known to exhibit structural breaks (Stock and Watson, 1996), unless
the true model is in the set of models under consideration we might expect the
ranking of theNmodels to vary over time. This might make it advantageous to
consider selecting the size of the hold-out period using the data.
Out-of-sample measures of fit, based on the logarithmic score, can serve as the
basis for density forecast combination whether the forecasts are model-based or
subjectively formed (see Hall and Mitchell, 2007; Jore, Mitchell and Vahey, 2008).
All that is required is the history of the density forecasts. Alternatively, Mitchell
and Hall (2005) advocate density forecast combination using what they call ‘KLIC’
weights, which use thepit’s rather than the logarithmic score, measured over a
hold-out period, to measure fit. This involves using each model’sKLIC
i
t−hvalue
(see (5.29)) computed recursively over the out-of-sample window, to determine
−i=KLICit−h−min(KLICit−h)in (5.50).
5.5.4.3 Empirical applications combining and evaluating density forecasts
Despite considerable experience combining point forecasts, and an extensive BMA
literature, there has been little applied macroeconomic work devoted explicitly
to density forecast combination and evaluation. For example, in his review of
the available empirical literature on forecast combination, Timmermann (2006)
focuses on point forecasts. Therefore, a consensus about if and when density fore-
cast combination “works” in macroeconomics has yet to emerge. Nevertheless, a
tentative start has been made (see Mitchell and Hall, 2005; Hall and Mitchell, 2007;
Jore, Mitchell and Vahey, 2008). An early suggestion is that there can be substantial