Applying word space models to sociolinguistics.
Religion names before and after 9/11.
Yves Peirsman, Kris Heylen and Dirk Geeraerts
Abstract
Researchers in disciplines like lexical semantics and critical discourse analysis are
in need of a quantitative method that allows them to model the distribution of a
word automatically. We advocate the use of word space models, a family of ap-
proaches that were developed in the context of computational linguistics and cog-
nitive science, which represent the meaning of a word in terms of its contexts in a
large corpus. In a case study on the use of religious terms before and after the
attacks of September 11, 2001, we show how these models can be employed to
determine the semantic similarity and relatedness between two words, and the
factors that influence them. One of the patterns we uncover is the increased asso-
ciation between Islam and terrorism in Dutch newspaper articles after 9/11, a trend
that is far less outspoken for Christianity. We also apply these new quantitative
instruments to explore the differences in word use between the five newspapers in
our corpus, and find a striking distinction between popular and quality newspapers.
Keywords: lexical semantics, word space models, semantic similarity, association,
religious terms, changes in word use
- Introduction
Of all computational-linguistic approaches to lexical semantics, word space
models currently set the trend (see e.g., Padó and Lapata 2007). Based on
the hypothesis that semantically similar words tend to be used in similar
contexts, these corpus-based approaches model the meaning of a word in
terms of the contexts in which it appears. They are applied to a wide variety
of computational tasks – from Question Answering and Information Re-
trieval to automated essay scoring (Landauer and Dumais 1997) or the
modeling of human behavior in psycholinguistic experiments (Lowe and
McDonald 2000). In this article, we will argue that such word space models