114 Yves Peirsman, Kris Heylen and Dirk Geeraerts
Nevertheless, these approaches model the behaviour of a word quite differ-
ently from the analyses above: word space models look only at the surface
context of a target word – defined in terms of articles or paragraphs, context
words, or syntactic relations. As a result, they can do without any kind of
manual labelling, and drastically increase the number of data we can deal
with. This computational approach to lexical semantics can therefore pro-
vide a useful quantitative tool in fields like variational linguistics, or Criti-
cal Discourse Analysis (CDA).
In short, while word space models may be new to the study of lexical
semantics, they have predecessors in the form of behavioural profiles,
quantitative onomasiological analyses and the time-honoured method of
manual lexicographic description. These advanced corpus-based techniques
have created the right atmosphere for the introduction of word space mod-
els in variational-linguistic research, or in fields that generally bring forth
more qualitative studies, like Critical Discourse Analysis.
2.2. Computational background
In computational linguistics, word space models of lexical semantics have
been around for quite a while now. In the literature a wide variety of ap-
proaches has been developed and discussed (see Schütze 1998, Lin 1998,
Purandare and Pedersen 2004, Sahlgren 2006, Padó and Lapata 2007 and
many others). The earlier models are often still the most popular ones, with
Latent Semantic Analysis (LSA, Landauer and Dumais 1997) and the
Hyperspace Analogue to Language (HAL, Lund and Burgess 1996) as the
two most well-known examples.
Despite all these different implementations, all word space models have
the same goal: to approximate word meaning by modeling word use. They
do this by keeping track of the contexts in which a word appears. In our
case study below, we will make use of two types of word space models: a
document-based and a syntax-based approach. Document-based models
(Landauer and Dumais 1997) express the distribution of a word in terms of
the articles (documents) in which it appears. Two words are thus related if
they often appear in the same articles. A syntax-based model, by contrast
(Lin 1998), defines the context of a target word as the context words with
which it is syntactically related, plus the type of syntactic relation involved.
Here two words are related if they often fulfill the same syntactic role or
function in a sentence.