Advances in Sociophonetics

38 Bernard Laks, Basilio Calderone and Chiara Celata

We recognize in Figure 2 what is a typical distribution, first described by Zipf (1949) and later generalized by Mandelbrot as the Mandelbrot-Zipf distribution (Brillouin 1959). This typical distribution derives from the power-law and is close to the distribution pattern described at the end of the 19th century by Pareto. When the frequency with which an event occurs varies as a power of some attri- bute of that event (e.g. its size), the frequency is said to follow a power-law. In linguistics, one famous example of power-law functions is the Zipf ’s law in corpus analysis, according to which the frequency of a word item in a text is inversely pro- portional to its frequency rank (i.e., the second most frequent word item occurring half as often the most frequent item; Zipf 1949). In the case of Mandelbrot-Zipf distributions, a clear distinction emerges between two zones of the curve (Wimmer & Altman 1999): the peak (or head) zone and the dispersal (or tail) zone. In the peak zone, a very small number of highly productive events are concentrated. The peak zone thus differs very neatly from the dispersal zone, an asymptotically zero zone where a large number of infrequent events with a marginal impact on the process are distributed. The latter condition is known as the phenomenon of the ‘long tail’. The ‘body’ of the curve is represented by a gradual shift from the peak to the dispersal distribution. The frequency analysis confirmed that the distribution of the lexical environments defining French liaison follows a power-law distribution to a significant extent. We calculated a goodness-of-fit score using the method of maximum likelihood in order to estimate the scaling exponent and the lower bound value of the distribution according to Clauset et al. (2009). The maximum likelihood estimators converge on the correct value of the scaling exponent, with probability 1 for both the discrete and the continuous power-laws (Clauset et al. 2009). Figure 3 shows the cumulative distribution function of the frequency of the lexical com- ponents involved in the French liaison and its fitted power-laws function on the basis of the maximum likelihood estimators. The fitted power-laws function was subsequently tested by calculating the goodness-of-fit with the actual liaison data. We used the Kolmogorov-Smirnov test (Clauset et al. 2009: 14) to calculate the p value of the goodness-of-fit function between the power-laws distribution and the liaison data as reported in Figure 3. The p value should be greater than 0.1 to allow for a plausible hypothesis on the data, otherwise the hypothesis has to be rejected. The obtained value p = 0.111 indicated that the distribution of the lexical environments defining French liaison follows a power-laws distribution to a statistically significant extent. Thus, according to the results of this global inspection of PFC data an extremely small number of types represent quantitatively the core of the process of French liaison, whereas the entire set of the remaining types accounts for no more than half of the actual occurrences.

Advances in Sociophonetics

Get our desktop app

Company

Features

Documentation

Resources