Applying word space models to sociolinguistics 127
will measure the distance between those lexical fields as a function of the
distance between the individual words, both before and after 9/11. In this
way, we can study if the perception of islam and christendom has shifted
with relation to these four topics.
Lexical fields are notably hard to delimit. Moreover, a manual construc-
tion may suffer from randomness or subjective judgments. We can, howev-
er, use document-based word space models to define our lexical fields au-
tomatically. A lexical field is then operationalized as the words with the
tightest document-based relation to a central word like terrorism or culture.
This is the approach we will take here.
There are a number of ways in which this undertaking can be carried
out. First, it can be argued that lexical fields are far from stable entities. It
would be no surprise if the lexical field of terrorism underwent some sub-
stantial changes after the September 11 attacks. For each of the central
words terrorisme ‘terrorism’, oorlog ‘war’, religie ‘religion’ and cultuur
‘culture’ we therefore defined two fields – one on the basis of the pre-9/11
corpus and one on the basis of the post-9/11 corpus. Each time we included
the 20 most related words to the central word together with that central
word, without manual correction. Because we also wanted to include parts
of speech different from nouns, we extended the set of possible nearest
neighbors from the 10,000 most frequent nouns in the corpus to all words
with a frequency of 200 or more. The top ten of words most related to ter-
rorisme, for example, now looks like this:
Before 9/11: terrorisme ‘terrorism’, terrorist ‘terrorist’, aanslag ‘at-
tack’, Libisch ‘Libyan’, catastrofaal ‘catastrophic’, Tsetjeens
‘Chechen’, terroristisch ‘terrorist (adj)’, kaping ‘hijack
(noun)’, moslimrebel ‘Muslim rebel’, moslimextremist ‘Mus-
lim extremist’
After 9/11: terrorisme ‘terrorism’, strijd ‘battle’, oorlog ‘war’, terrorist
‘terrorist’, militair ‘military’, bondgenoot ‘ally’, 11 ‘11’,
Amerikaans ‘American (adj)’, terroristisch ‘terrorist (adj)’,
internationaal ‘international’
Apart from a few spurious words, these automatically collected sets of
words appear very reasonable indeed. Before 9/11, the lexical field of ter-
rorism is a mixed bag of 21 words referring to a number of political and
religious issues: the relationship between Russia and Chechnia, the Taliban,
Libya and Islam. After 9/11 these have disappeared and been replaced by