Applying word space models to sociolinguistics 113
tics of these models (Sahlgren 2006, Padó and Lapata 2007, Peirsman,
Heylen and Speelman 2007, Heylen, Peirsman, Geeraerts and Speelman,
2008); in recent years, a number of related approaches in corpus linguistics
have also paved the way. In line with its increased interest in corpora (see
e.g., Tummers, Heylen, and Geeraerts 2005), the cognitive-linguistic com-
munity in particular has grasped the importance of a usage-based study of
lexical semantics, based on more advanced techniques than just the extrac-
tion of examples from corpora. Such corpus-based approaches to lexical
semantics are the focus of a number of recent anthologies (e.g. Gries and
Stefanowitsch 2006, Stefanowitsch and Gries 2006) and were the topic of a
successful theme session at the 10th International Cognitive Linguistics
Conference. Advanced statistical methods, like clustering techniques or
correspondence analysis, are currently at the centre of attention.
Basically, there are two perspectives such a corpus-based study of lexi-
cal semantics can take. First, it is possible to focus on one polysemous
word, and investigate the syntactic or lexical features that correlate with the
occurrences of its several meanings. This semasiological approach is
represented by Gries’ (2006) study of the English verb to run. Gries labels
all occurrences of to run in ICE-GB and the Brown Corpus with a number
of tags that together form the behavioral profile of the verb. This profile
contains morphological features, syntactic properties of the clause, seman-
tic characteristics of the relevant participants, collocates of the verb in the
same clause and a paraphrase of the verb’s meaning. Gries then uses this
data to identify the distinct senses of to run, to find its prototypical sense,
and to determine how these can be combined in a network, among others.
Basically, this approach is a computational alternative to the traditional
work of a lexicographer or lexicologist: it tries to identify the contexts that
go together with the specific senses of a word (Geeraerts 1997).
Second, it is also possible to study not just one word, but a set of words
and the differences and similarities between them. This more onomasiolog-
ical perspective is taken by Divjak and Gries (2006), who cluster verbs of
trying in Russian according to their behavior in a corpus. Similarly, Glynn
(2009) explores the differences in behavior between the verbs annoy, both-
er and hassle in British and American English.
Word space models of lexical semantics allow for both types of investi-
gation. On the one hand, they can be used to cluster the various occurrences
of a word into groups that often largely correspond to the several senses of
that word (Schütze 1998). On the other, they make it possible to find the
similarities between several words on the basis of their contexts in a corpus.