Stephen G. Hall and James Mitchell 215
5.4.2.1 Goodness-of-fit tests: in theory
Diebold, Gunther and Tay (1998) popularized the idea in economics of statisti-
cally evaluating a sample of density forecasts based on the probability integral
transforms (pit’s) of the realization of the variable with respect to the forecast
densities. An alternative approach is based on the integrated squared difference
between the density forecast and a nonparametric estimate off(yt|!t−h); see Li
and Tkacz (2006). We focus on the former approach since it does not require the
strict stationarity ofyt.
Diebold, Gunther and Tay (1998) proved that a sequence of estimatedh-step-
ahead density forecasts,{g(yt|t−h)}Tt= 1 , for the realizations of the process{yt}Tt= 1 ,
coincides with the (unknown) true densities{f(yt|!t−h)}Tt= 1 when the sequence
ofpit’s,zt|t−h, are uniform variates, where:^4
zt|t−h=
∫yt
−∞
g(u|t−h)du=G(yt|t−h);(t=1,...,T). (5.25)
Since the correct density forecast will be preferred by all users, irrespective of
their loss function, testing thepit’s is attractive as it offers a means of evaluating
forecasts without the need to specify a loss function. This is convenient given
that it is hard to define an appropriate general (economic) loss function, although
it is sometimes possible: Clements (2004) provides an evaluation of the Bank of
England’s fan charts for inflation based on economic as well as statistical loss.
But just as a “good” interval forecast should be correctly calibrated both uncondi-
tionally and conditionally, so should a “good” density forecast. This translates into
the requirement that, whenh=1,zt|t−his not just uniform but also independently
distributed. In other words, one-step-ahead density forecasts are optimal and cap-
ture all aspects of the distribution ofytonly when thezt|t− 1 are independently and
uniformly distributed. Whenh>1 we should expect serial dependence inzt|t−h
even for correctly specified density forecasts. Again this is analogous to expecting
dependence (anMA(h− 1 )process) when evaluating a sequence of optimal rolling
h-step-ahead point forecasts. There is not, however, a one-for-one relationship
between the point forecast errors andzt|t−h.
It is important, as stressed by Mitchell and Wallis (2008), to test density fore-
casts not just unconditionally, via a distributional test, but conditionally via a
test for independence. Otherwise one does find, as in Gneiting, Balabdaoui and
Raftery (2007) and motivating their advocation of scoring rules, that uniformity
of thepit’s is a necessary but not sufficient condition for optimal density forecasts.
5.4.2.2 Goodness-of-fit tests: in practice
Following the lead of Diebold, Gunther and Tay (1998), evaluation tests are com-
monly based on the difference between the empirical distribution ofzt|t−hand the
cumulative distribution function of a uniform random variable on [0,1], i.e., the
45 ◦line. In many empirical studies, this has simply involved the application of a
Kolmogorov–Smirnov or Anderson–Darling test for uniformity. For one-step-ahead
forecasts this is often supplemented with a separate test for the independence of