Handbook of Psychology, Volume 4: Experimental Psychology

(Axel Boer) #1

590 Text Comprehension and Discourse Processing


4.Do you know my sister?


5.I know the feeling!


6.His greed knew no limits.


7.I know Latin.


8.This child knows right from wrong.


Examples 3, 4, and 7 would seem to be clear examples of
knowledge, but how does one draw the line? But suppose we
knew what knowledge was. What, then, is its structure, how is
it organized? Semantic hierarchies, feature systems, schemas,
and scripts, or one huge associative net? All of these possibil-
ities and several more have had their sponsors, as well as their
critics. But once again, suppose we had a workable model of
what human knowledge structures are like. How could we
then determine what the content of these structures actually
is? There are two ways to do this: One can hand-code all
knowledge, as it is done in a dictionary or encyclopedia,
except more systematically and more complete, or one can
build a system that learns all it needs to know. We discus
an example of both approaches, both of which have proven
their usefulness for psychological research on discourse
comprehension.


WordNet


WordNet is what a dictionary should be. Unlike most dictio-
naries, WordNet aspires to be a complete and exhaustive list
of all word meanings or senses in the English language; it de-
fines these meanings with a general phrase and some illustra-
tive examples, and lists certain semantically related terms
(Fellbaum, 1998; G. A. Miller, 1996). This is all done by
hand coding. Each word in the language has an internal struc-
ture in WordNet, consisting of the syntactic categories of the
word and, for each category, the number of different seman-
tic senses (together with informal definitions and examples).
Thus, the word bankis both a noun and a verb. For the noun,
10 senses are listed (the first two are familiar financial insti-
tutionandriver bank;the 10th isa flight maneuver). The verb
bankhas seven senses in WordNet. Furthermore, each word
(actually, each word sense) is related to other words by a
number of semantic relationships that are specified in
WordNet: synonymy (e.g. financial institutionis a synonym
ofbank-1), coordinate relationship (lending institutionis a
coordinate term for bank-1), hyponymy (... is a kind of
bank), holonymy (bank is part of.. .), and meronymy (parts
of bank). Thus, a detailed, explicit description of the lexicon
of the English language is achieved, structured by certain
semantic relations.
WordNet is a useful and widely used tool for psycholin-
guists and linguists. Nevertheless, it has certain limitations,


some of which arise from the need for hand coding. WordNet
is the reified intuition of its coders, limited by the chosen for-
mat (e.g., the semantic relations that are made explicit). But
language changes, there are individual differences, and peo-
ple can use words creatively in novel ways and be understood
(E. V. Clark, 1997). The mental lexicon may not be static, as
WordNet necessarily must be, but may evolve dynamically,
and the context dependency of word meanings may be so
strong as to make a listing of fixed senses illusory.
The task of hand coding a complete lexicon of the English
language is certainly a daunting one. Hand coding all human
knowledge presents significant additional difficulties. Never-
theless, the CYC (CYC is a very large database in which
human knowledge is formally represented by a language
called CycL. CYC is a registered trademark of Cycorp. The
interested reader is directed to http://www.cyc.com/tech.html
for more information.) system of Lenat and Guha (1990) at-
tempts just that. CYC postulates that all human knowledge
can be represented as a network of propositions. Thus, it has a
local, propositional structure, as well as global structure—the
relations among propositions and the operations that these
relations afford. Like WordNet, however, CYC is a static
structure, always vulnerable because some piece of human
knowledge has not been coded or acts in an unanticipated
way in a new context.
Therefore, some authors have argued for knowledge rep-
resentations that learn what they need to know and thus are
capable of keeping up with the demands of an ever-changing
context. One such proposal is reviewed in the following
section.

Latent Semantic Analysis

Latent semantic analysis (LSA) is a machine learning proce-
dure that constructs a high-dimensional semantic space from
an input consisting of a large amount of text (LSA is also dis-
cussed in this volume in the chapter by Treiman, Clifton,
Meyer, & Wurm and in the chapter by Goldstone & Kersten).
LSA analyzes the pattern of co-occurrences among words in
many thousands of documents, using the well-known mathe-
matical technique of singular value decomposition. This
technique allows one to extract 300–500 dimensions of
meaning that are capable of representing human semantic in-
tuitions with considerable accuracy. LSA generates a seman-
tic space in which words as well as sentences or whole texts
are represented as mathematical vectors. The angle between
two vectors (as measured by their cosine) provides a useful,
fully automatic measure of the semantic similarity between
the words they represent. Thus, we can compute the semantic
similarity between any two word pairs or any two texts.
Free download pdf