YVES CRAMA, GEORGES HÜBNER AND JEAN-PHILIPPE PETERS 5
where p∈(0, 1) and r is a positive integer. The relationship between
mean and variance is the opposite of the binomial as mean=
r(1−p)
p and
variance=r(1p− 2 p). Thus, the mean is smaller than the variance for the
negative binomial distribution.
Agood starting point to determine the most adequate frequency distribu-
tion is therefore to check the relationship between mean and variance of the
observed frequency. If the observed variance is much higher (resp. lower)
than the observed mean, a negative binomial (resp. binomial) distribution
could be well-suited to model frequency.
Other techniques to discriminate between these distributions include
goodness-of-fit tests such as theχ^2 test. The idea of this test is to split the
population intokadjacent “classes” of equal width, and then to compute
the following statistic:
χ^2 =
∑k
j= 1
(nj−Ej)^2
Ej
wherenjis the number of elements observed in classjandEjis the theoretical
expected number of observations in the class. This test should be interpreted
as follows: the lowerχ^2 , the better the fit.
IfH 0 is true (for example, the observed series follows the tested dis-
tribution),χ^2 converges to the distribution function that lies between the
chi-square distributions withk−1 andk−m−1 degrees of freedom (where
mis the number of estimated parameters). Thus ifχ^2 >χ^2 k−1,1−αwhere
χ^2 k−1,1−αis the upper 1−αquantile of the asymptotic chi-square distribu-
tion, the null hypothesis is rejected.^4 Finally, a rule of thumb to decide the
number of bins is thatk≥3 andEj≥5 for allj.
Severity distribution
The severity distribution models the economic impact of operational risk
loss events. Consequently, any strictly positive continuous distribution can
be used to model operational losses. However, operational risk databases are
often characterized by a large bulk of “high frequency/low impact” losses
and a few “low frequency/high impact” losses. Leptokurtic distributions
are thus most appropriate to model the severity distribution. Candidate dis-
tributions include log-normal, log-logistic, Pareto or Weibull distributions.
Table 1.1 summarizes the probability distribution functions (PDF) of these
distributions.^5
To test the adequacy of the estimated distribution for the observed val-
ues, goodness-of-fit statistics can again be calculated, for example by the