174 Forecast Combination and Encompassing
The above methods of combining forecasts are simple and straightforward to
implement, but the literature also contains many extensions to these simple
approaches, the relatively early contributions of which are summarized by Clemen
(1989). For example, Diebold (1988) and Coulson and Robins (1993) argue that,
when estimating weights based on (4.9), account should be taken of the likely
autocorrelation present inεt, either by allowing for ARMA residuals or includ-
ingyt− 1 as an additional regressor. The possibility of time-varying combination
weights could also be entertained, reflecting the potentially evolving behavior of
the process governing the actuals, and also of the individual forecasters. At a simple
level, this might involve using recent data only to estimate the combination
weights, while more sophisticated approaches are proposed by Diebold and Pauly
(1987), LeSage and Magura (1992), and Deutsch, Granger and Teräsvirta (1994).
Relatively sophisticated methods of forecast combination inherently entail a
greater data requirement, and are therefore most applicable when a reasonably
long history of forecast performance is available. When, as is very often the case,
only small samples of historical data exist, sampling variability plays a significant
role in the estimation of the combination weights. This can temper the gains that
could be realized relative to when the weights are known, potentially even giv-
ing rise to forecast combinations that are less accurate than simpler approaches
that do not require combination weight estimation. For example, many authors
(see, e.g., Makridakis and Winkler, 1983; Stock and Watson, 1999; Fildes and Ord,
2002) have found that simple averaging of individual forecasts very often outper-
forms more elaborate combination techniques, while Harvey and Newbold (2005)
demonstrate that situations exist where the optimal weight on, say,f 2 tis non-zero,
but sampling variability affects the weight estimates to the extent that the result-
ing combination has a larger MSFE than that associated with justf 1 talone. The
Bayesian combination methods of,inter alia, Clemen and Winkler (1986), Diebold
and Pauly (1990) and Min and Zellner (1993) provide a means of formally estimat-
ing the combination weights, while mitigating the effects of sampling variability
by shrinking the weights towards some prior mean. The widely observed robust
performance of simple averages of forecasts motivates a prior of equal weights in
this setting.
Another extension to combining forecasts is to allow nonlinear combination
methods. Such methods may be useful when relatively large samples of forecasts
are available, and/or when the nature of the forecasts suggests methods other than
linear combination. Given a large number of forecasts, an attractive way of consid-
ering nonlinear combination schemes is via Artificial Neural Networks (ANNs), as
ANNs are able to approximate large classes of nonlinear functions. Donaldson and
Kamstra (1996) use single hidden-layer ANNs to combine forecasts of the volatil-
ity of daily stock returns from GARCH and moving-average variance models, and
compare the results to traditional linear combination. Specifically, the ANNs are of
the form:
fct=α+
∑k
j= 1
βjfjt+
∑p
i= 1
δi!
(
ztγi
)
,