38 Bernard Laks, Basilio Calderone and Chiara Celata
We recognize in Figure 2 what is a typical distribution, first described by Zipf
(1949) and later generalized by Mandelbrot as the Mandelbrot-Zipf distribution
(Brillouin 1959). This typical distribution derives from the power-law and is close
to the distribution pattern described at the end of the 19th century by Pareto.
When the frequency with which an event occurs varies as a power of some attri-
bute of that event (e.g. its size), the frequency is said to follow a power-law. In
linguistics, one famous example of power-law functions is the Zipf ’s law in corpus
analysis, according to which the frequency of a word item in a text is inversely pro-
portional to its frequency rank (i.e., the second most frequent word item occurring
half as often the most frequent item; Zipf 1949).
In the case of Mandelbrot-Zipf distributions, a clear distinction emerges
between two zones of the curve (Wimmer & Altman 1999): the peak (or head)
zone and the dispersal (or tail) zone. In the peak zone, a very small number of
highly productive events are concentrated. The peak zone thus differs very neatly
from the dispersal zone, an asymptotically zero zone where a large number of
infrequent events with a marginal impact on the process are distributed. The latter
condition is known as the phenomenon of the ‘long tail’. The ‘body’ of the curve is
represented by a gradual shift from the peak to the dispersal distribution.
The frequency analysis confirmed that the distribution of the lexical environ-
ments defining French liaison follows a power-law distribution to a significant
extent. We calculated a goodness-of-fit score using the method of maximum likeli-
hood in order to estimate the scaling exponent and the lower bound value of the
distribution according to Clauset et al. (2009). The maximum likelihood estima-
tors converge on the correct value of the scaling exponent, with probability 1 for
both the discrete and the continuous power-laws (Clauset et al. 2009). Figure 3
shows the cumulative distribution function of the frequency of the lexical com-
ponents involved in the French liaison and its fitted power-laws function on the
basis of the maximum likelihood estimators. The fitted power-laws function was
subsequently tested by calculating the goodness-of-fit with the actual liaison data.
We used the Kolmogorov-Smirnov test (Clauset et al. 2009: 14) to calculate the
p value of the goodness-of-fit function between the power-laws distribution and
the liaison data as reported in Figure 3. The p value should be greater than 0.1 to
allow for a plausible hypothesis on the data, otherwise the hypothesis has to be
rejected. The obtained value p = 0.111 indicated that the distribution of the lexi-
cal environments defining French liaison follows a power-laws distribution to a
statistically significant extent.
Thus, according to the results of this global inspection of PFC data an
extremely small number of types represent quantitatively the core of the process
of French liaison, whereas the entire set of the remaining types accounts for no
more than half of the actual occurrences.