Chapter 2. French liaison and the lexical repository 39
More specifically, the 13 most frequent types alone represent the 30% of the
16,805 occurrences of enacted liaison, while as few as 50 types account for half of
the total number of liaisons. The distribution of those high-frequency types is very
sparse inasmuch as there is a relatively sharp decline of the number of occurrences
in moving across the 13 most frequent types. The tail of the distribution is equally
remarkable, but for the opposite reason: no less than 3,055 types must be called
upon to cover about the same number of realized liaisons, i.e. the remaining 50%
of the realizations. The distribution of those 3,055 types in the curve is crowded,
as opposed to the sparseness of the head, since many types show the same low
frequency values and the farther right we move along the curve, the more types
with the same frequency value are found. For this reason the tail of the curve is
much milder than the head and it ends up flat.
In other words, liaison types occupying the tail of the curve have a very low
probability of occurrence in the corpus but since they are very numerous, they are
equally essential for the global picture.
Table 1 illustrates the nature of the two zones of the curve with some exam-
ples. Few types at the top of the frequency ranking yield a relatively high cumula-
tive percent. On the contrary the bottom zone of the ranking (corresponding to
the long-tail of the curve) is occupied by a very large number of very rare liaison
types: almost 1,800 liaison types have a token frequency equal to 1.
These data provide interesting challenges to usage-based and exemplar-
ist models of phonological processing, in which lexicon and grammar are inte-
grated and constrained by the same organizational principles (e.g., Bybee 1998;
P(Freq)
Freq
100
10 –1
10 –2
10 –4
10 –3
100 101 102 103 104
Figure 3. Cumulative distribution function for the lexical environments defining French
liaison in the PFC corpus and fitted power-law distribution function obtained through
maximum likelihood estimators.