226 Recent Developments in Density Forecasting
the so-called regression approach, is to tune the weights to reflect the historical
performance of the competing forecasts (e.g., see Granger and Ramanathan, 1984).
Choosing the weights via OLS estimation of the realizations of the variable on the
competing point forecasts is “optimal,” given quadratic loss; the optimal weighted
combination of the point forecasts is the most “accurate” point forecast, in the
sense of minimum RMSE. In the following section we consider extensions to
density forecasts that, essentially, involve choosing the weights to maximize the
in-sample or out-of-sample (predictive) “fit” of (5.40).
5.5.4 KLIC minimizing weights
How we measure the accuracy of forecasts is central to how we might choose
to combine them. Similar to how RMSE (least squares) has been the historical
basis for much analysis of point forecasts, Mitchell and Hall (2005) and Hall and
Mitchell (2007) suggest that the KLIC can serve as the basis for density forecast
combination, as well as evaluation and combination. The KLIC offers a unifying
framework in which to consider choosing the combination weights.
The KLICdistancebetween the true densityf(yt|!t−h)and the combined density
forecastp(yt|t−h)(t=1,...,T) is defined as:
KLICt|t−h=
∫
f(yt|!t−h)ln
{
f(yt|!t−h)
p(yt|t−h)
}
dyt
=E
[
lnf(yt|!t−h)−lnp(yt|t−h)
]
. (5.44)
The smaller this distance, the closer the density forecast to the true density.
KLICt|t−h=0 if and only iff(yt|!t−h)=p(yt|t−h), which is an “average form”
of the rational expectations hypothesis (see Pesaran and Weale, 2006, p. 722).
Given this loss function, Hall and Mitchell (2007) define the “optimal”
combined density forecast as:
p∗(yt|t−h)=
∑N
i= 1
w∗ig(yt|it−h), (5.45)
where the optimal weight vectorw∗=(w∗ 1 , ..,w∗N)minimizes the KLIC distance
between the combined and true density, (5.44). This minimization is achieved as
follows:
w∗=arg max
w
1
T
∑T
t= 1 lnp(yt|t−h), (5.46)
where^1 T
∑T
t= 1 lnp(yt |t−h)is the average logarithmic score of the combined
density forecast over the samplet =1,...,T; for related discussion in terms
of quasi-maximum likelihood estimation, see White (1982). For an analytical
discussion of “optimal” pooling using (5.46), see Geweke and Amisano (2008).
Minimizing the KLIC distance by maximizing the logarithmic score is convenient
as it avoids having to postulate and estimatef(yt|!t−h), which is unknown. At
the expense of having to make an assumption about the form ofq(.), Hall and
Mitchell (2007) do consider how, for those goodness-of-fit tests directly related to