Chapter 2. French liaison and the lexical repository 41
frequency that is only slightly higher than or equal to 1. Therefore, memorization
alone cannot account for the whole story. And, contrary to Bybee’s (2005) predic-
tion that low frequency constructions tend to be lost, what we found here is that
a very long list of very infrequent lexical environments account for 40–50% of the
occurrences of liaison in the corpus.
We suggest therefore that a productive process of generalization has to be
postulated, in order to account for the existence of such a long list of very infre-
quent liaison constructions. This is a potentially open list, inasmuch as some
relevant linguistic feature (either morphosyntactic, or phonological, or both,
according to the individual cases) could in principle trigger the process of gen-
eralization even further.
3.2 Distributional analysis of liaison consonants
As a further step in the analysis, we wanted to evaluate whether the obtained dis-
tribution is consistently represented in some relevant subgroups of data defined by
‘system-internal’ and ‘system-external’ factors. As an internal factor we considered
the phonological nature of the linking consonant, limited to the three most frequent
consonant types in the corpus, which are /n/, /z/ and /t/, respectively. As external
factors we included the two sociolinguistic factors of age and educational level, as
encoded in the PFC corpus. The former is known to affect slightly the production
of liaison inasmuch as older speakers are generally said to produce more liaisons in
facultative contexts than younger speakers (e.g., Malécot 1975; Ashby 1981; Booij &
de Jong 1987; Ranson 2008; Durand et al. 2011). The latter still is a poorly investi-
gated factor (Durand et al. 2011: 121) and the PFC database appears to have strong
potential for future research. By introducing phonological and sociolinguistic vari-
ables into the picture, whose influence on liaison production is either well-known
or still a matter of debate, we wanted to verify whether the power-laws statistical
distribution of enacted liaisons varies according to different subgroups of data or
whether, on the contrary, it resists any manipulation of the corpus.
Figure 4 illustrates the data split by consonant type (with /n/, /z/ and /t/ as
the relevant consonants) compared to the global distribution of the liaison data.
The top of the figure is occupied by a reduced version of Figure 2, reproduced
here to facilitate internal comparisons. The three bottom diagrams refer to the
liaison environments realized by /n/, /t/ and /z/, respectively. The sibilant frica-
tive is the most frequent liaison consonant in the corpus with 7,840 occurrences
(corresponding to the 47% of the corpus). The nasal is also extremely present in
the corpus (6,449 occurrences, corresponding to the 39% of the corpus), while
the alveolar stop is present in a smaller proportion (2,342 occurrences, almost
