Stephen G. Hall and James Mitchell 229
to the truth given the data (posterior probabilities) when attaching equal (prior)
weight to each model, which a Bayesian would term non-informative priors (see
Burnham and Anderson, 2002). Minimizing the Akaike criterion is approximately
equivalent to minimizing the expected Kullback–Leibler distance between the true
density and the estimated density; again see Burnham and Anderson (2002, Ch. 2,
Ch. 6).
The combined density forecastpBMA(yt|t−h)also has established optimality
properties given the set of models under consideration (see Madigan and Raftery,
1994; Raftery and Zheng, 2003). The central estimate frompBMA(yt | t−h)
minimizes mean squared error, the prediction intervals are well calibrated and
pBMA(yt|t−h)maximizes the logarithmic score given Pr(Si). On this basis the
combined density cannot provide worse forecasts (in-sample,t=1,...,T), as eval-
uated by the average logarithmic score, than the best individual forecast. This
follows from:
KLICit=E
[
lnf(yt|!t−h)−lng(yt|it−h)
]
≥ 0 (5.51)
⇒E(lnpBMA(yt|t−h))≥E(lng(yt|it−h))⇔KLICt≤KLICit|t−h, (5.52)
(i=1,...,N;t=1,...,T). However, BMA implicitly assumes that all the models
under consideration are stable. When they are not, perhaps if there are struc-
tural breaks, and when the set of models under consideration is not exhaustive
(and, therefore, does not include thetruemodel), non-Bayesian weights, like equal
weights orw∗i, might be more appropriate and, e.g., deliver a higher log score
(see Geweke and Amisano, 2008). In the presence of unknown structural breaks,
Pesaran and Timmermann (2007) proved that it can be helpful to average not just
over different models, but over different estimation windows for a given model;
Assenmacher-Wesche and Pesaran (2008) found equal weights performed best (i.e.,
delivered the lowest RMSE) in an application to the Swiss economy.
In contrast to methods designed to combine point forecasts (cf. Bates and
Granger, 1969), the weightsw∗iorwiBMAdo not allow explicitly for correlation
between forecasts. One possibility for future research is to consider the use of cop-
ula functions to account for the dependence between the density forecasts (see
Jouini and Clemen, 1996; Mitchell, 2007a).
5.5.4.2 Out-of-sample measures of fit
To protect against in-sample overfitting, predictive, rather than in-sample
(likelihood-based), measures of fit have also been proposed as the basis for forecast
combination (see Eklund and Karlsson, 2007; Kapetanios, Labhard and Price, 2006;
Andersson and Karlsson, 2007), although these papers do not consider forecast
density evaluation. However, in a specific sense, the marginal likelihood can be
interpreted as a measure of out-of-sample predictive performance, as well as a mea-
sure of in-sample fit. This is because the marginal likelihood can be written, as
seen, as the product of one-step-ahead predictive densities; also see Geweke and
Whiteman (2006, pp. 15–17). However, it cannot be decomposed directly into
the product ofh-step-ahead density forecasts. Moreover, to interpret the marginal