Applying word space models to sociolinguistics 121
3.2.1. Syntax-based distribution
One way of determining how the distribution of christendom and islam has
changed is to look at the words in the corpus with the most similar syntax-
based distribution to either of those. Therefore we compared the context
vectors of christendom and islam to those of the 10,000 most frequent
nouns in the corpus, and selected from those the 100 nouns with the most
similar context vectors. We refer to these nouns as the 100 nearest neigh-
bors of christendom and islam. A comparison of these lists, both between
the two subcorpora and the two target words, brings to light some interest-
ing differences.
First we want to find out in what way the use of islam and christendom
has changed after 9/11, and whether we see an increased similarity with
terrorisme ‘terrorism’, for instance. We therefore contrast the lists of near-
est neighbors to islam and christendom before and after 9/11. For each
nearest neighbor, we calculate its difference in ranks between the two sub-
corpora, in order to discover which neighbors have climbed on the list, and
which ones have fallen. If a nearest neighbor does not appear in one of the
lists, it is automatically assigned rank 101 for that list. Moreover, instead of
using the original ranks, we compute the difference between the natural
logarithm of the ranks. This logarithmic scale ensures that differences far
down in the list of nearest neighbors are treated as less important than those
at the top: for instance, we want the difference between 1 and 20 to be
much larger than that between 81 and 100.
Let us give an example. Moslim ‘Muslim’ was the 16th nearest neighbor
to islam before 9/11, but climbs to 6th place afterwards. Its difference score
is therefore ln(6)-ln(16) = 0.981. Koran, the 19th nearest neighbor to islam
after 9/11 does not appear in the list before 9/11. Its difference score is
therefore ln(19)-ln(101) = 1.671. Calculated thus, the ten highest climbers
of islam and christendom are given in Table 1. The eyecatcher of Table 1 is
terrorisme ‘terrorism’: the highest climber of islam (position 12 after 9/11)
is only the 11th highest climber for christendom (position 50 after 9/11).
The table also shows a tighter link between fundamentalisme ‘fundamental-
ism’ and both religions after 9/11, and between jihad ‘jihad’ and islam. The
other highest climbers are either more neutral in meaning or display an
expected link with either of the two religions (e.g., Koran ‘Quran’ and is-
lam). In short, there is indeed a notable increase in syntax-based relatedness
between islam and a number of words related to terrorism. This increase is
far less clear with christendom.