Chapter 2. French liaison and the lexical repository 37
- Results
3.1 Distributional analysis of liaison types
The 16,805 liaison occurrences produced in free and guided conversations turned
out to be organized in 3,105 environments (or “types”) of liaison. Each environ-
ment was defined by a given token frequency, ranging from 1,318 to 1. The data
were plotted into a log-log graph (Figure 2). Log-log graphs are two-dimensional
graphs of numerical data that use logarithmic scales on both the horizontal and
vertical axes, and can be used to examine the tail of a distribution of data.
In statistics and probability theory the use of a log-log graph for plotting data
distribution is a common practice because it allows for clear visualization even for
data which is scarce in frequency.
In our analysis we plot the frequency of each liaison type along the y-axis and
along the x-axis we report the rank of each type according to their frequency.
If the points in the plot tend to converge into a straight line for large numbers
in the x-axis, then the researcher concludes that the distribution has a power-law
tail (Jeong et al. 2000). Figure 2 displays the rank order of each liaison type by its
number of occurrences in the corpus (y-axis).
REALISATIONS OF LIAISON IN THE PFC CORPUS 16,805 TOKENS AND 3,105 TYPES
OCCURRENCES
RANK
50 types account for
50% of tot, realizations
13 types account for
30% of tot, realizations
104
103
102
101
100
100 101 102 103
3,055 types account for
the remaining 50 %
Figure 2. Log-log plot of liaison environments (or ‘types’) in the PFC corpus (rank order
by number of occurrences).