Applying word space models to sociolinguistics 117
quantified as the cosine of the angle between their context vectors. Because
wine and beer will probably occur together in more articles than beer and
car, we will again find a higher semantic relatedness between the former
word pair. Indeed, both on the syntagmatic and the paradigmatic axis, wine
and beer are closer to each other than they are to car.
In short, word space models allow us to approximate our intuitions
about the semantic relationship between two words by simply modeling the
meaning of the words in terms of the contexts in which they occur. It goes
without saying that the computational implementation goes slightly further
than our sketch here. For instance, context vectors in practice rarely use the
raw co-occurrence frequencies of the features and the target word, since
these are heavily dependent on the nature of the features. Syntactic rela-
tions that occur very frequently in the corpus (syntax-based models) or
extremely long articles (document-based) will automatically have high
values for a large number of target words. This problem is usually tackled
by replacing the raw frequencies with a statistical measure like point-wise
mutual information, which indicates if the target word and the feature occur
together more or less often than we expect on the basis of their individual
frequencies. For these and other technical details, we refer the interested
reader to our more computationally-oriented papers (Peirsman, Heylen and
Speelman 2007, Heylen, Peirsman, Geeraerts and Speelman, 2008).
2.3. Case study
So far, word space models have proved their usefulness mainly in the field
of computational linguistics. It is our belief, however, that they can equally
be applied to more theoretical-linguistic research questions. Like other
advanced empirical approaches, they have the major advantage that they
can cope with far more examples than any manual analysis can, and that
they can help identify patterns that would otherwise remain hidden from
the human eye.
In this paper, we will apply word space models to an investigation of
language variation. In particular, we will focus on a corpus of newspaper
text, and try to find out in what way the use of religion names, particularly
islam ‘Islam’ and christendom ‘Christianity’, has changed after the terrorist
attacks of September 11, 2001. Through the empirical investigation of the
contexts in which these words are used, we will pin down changes in typi-
cal contexts and hence, shifts in media coverage. Our investigation can thus