A cognitive approach to quantitative sociolinguistic variation 303
lish (BNC http://www.natcorp.ox.ac.uk/ or
http://www.scottishcorpus.ac.uk/ for Scottish English).
We therefore advocate an approach which measures lexical frequency
relative to the local corpus from which the data were collected^9.
Because varbrul requires discrete variants of all variables, it was unfor-
tunately necessary to convert the continuous measurement of lexical fre-
quency into discrete categories. Rather than create arbitrary cut points in
the data or force category divisions in order that the number of tokens in
each was approximately equal, the raw results for (th): [f] in all variable
lexical items^10 were plotted against lexical frequency in a scattergram and
natural ‘bunches’ in the data were highlighted (see Figure 3). While these
categories do not contain an equal number of tokens or types, they
represent the frequency categories that naturally emerged from the data and
so these were used to quantify the continuous measurement of lexical fre-
quency into a categorical format for varbrul.
In order to achieve a valid varbrul analysis, the factor groups must be
‘orthogonal’ (Guy 1988: 136) i.e. there must be minimal overlap between
the factor groups. This can often be difficult to achieve, for example in the
‘linguistic’ factors coded here, there is a certain amount of overlap between
the factor groups ‘place of (th) in the syllable’, ‘place of (th) in the word’
and ‘word boundary’. Interactions (or associations) between social factor
groups is perhaps even more difficult to avoid as there is more potential for
overlap (see Bayley 2002: 131): individuals tend to form friendship cliques
with others of the same sex, of roughly the same age and from the same
local area. It is extremely important to consider the effect of these distribu-
tional interactions when conducting statistical analyses. In Varbrul, it is
possible to spot such interactions with the ‘crosstabs’ function because the
cells of a crosstabulation will be unevenly occupied when there are interac-
tions between factor groups. We attempted to tease apart any possible inte-
ractions between different factors influencing variation by running the
analysis repeatedly and including different factor groups in the analysis
each time. We then compared the results of each analysis using a likelihood
ratio test to find which provided the best ‘fit’ and therefore the best indica-
tion of the likely factors influencing this variation^11. Table 3 is organized to
show the factor groups in the order of their significance on the variation.
Factor groups not selected as significant are not shown in this table.