Advances in Cognitive Sociolinguistics (Cognitive Linguistic Research)

(Dana P.) #1

116 Yves Peirsman, Kris Heylen and Dirk Geeraerts


d.obj_drink d.obj_buy d.obj_prefer d.obj_park pp.obj_glass_of pp.obj_bottle_of has.adj_red has.adj_Belgian has.adj_new

wine 1 0 0 0 1 1 2 0 0
beer 2 0 0 0 0 1 0 1 0
car 0 1 1 1 0 0 1 0 0

Figure 1. Syntax-based context vectors of the words wine, beer, and car


They indicate, for instance, that wine occurs as the direct object of drink
once, beer twice and car never. In reality, of course, the vectors will have
far more dimensions and often much higher values than 1 or 2. However,
even on the basis of this simple example, it is clear that the two most simi-
lar context vectors are those of wine and beer. This is confirmed when we
compute the quantitative similarity between the two vectors. In the litera-
ture on word space models, the most popular approach is to calculate the
cosine of the angle described by the two vectors (see e.g., Bullinaria and
Levy 2008). This metric gives us the following figures:


cos(wine, beer) = 0.46
cos(wine,car) = 0.38
cos(beer,car) = 0.0

We have now reached the desired outcome: wine and beer are indeed more
paradigmatically related to each other than to car. The word pair wine – car
also has a non-zero cosine value, because both words appear with red as a
modifying adjective. Beer never does.
Of course, we need not use syntactic relations as contextual features. La-
tent Semantic Analysis, for instance (Landauer and Dumais 1997), ignores
syntax altogether. Instead, it counts the number of times each target word
occurs in the documents that make up the corpus. For a newspaper corpus,
we may want to determine the frequencies of our target words in each of
the thousands of articles, for instance. These articles then form the dimen-
sions of the context vectors, in the same way as the syntactic relations
above. The semantic relatedness between two target words is now again

Free download pdf