Handbook of Psychology, Volume 4: Experimental Psychology

(Axel Boer) #1

452 Semantic Memory and Priming


text. HAL’s vocabulary consists of the 70,000 most frequently
used symbols in the corpus. About half of these symbols have
entries in the standard Unix dictionary; the remainder in-
cludes nonwords, misspellings, proper names, and slang. For
ease of exposition, we refer to the 70,000 symbols as words.
Themethodologythereforeproducesa70,000×70,000ma-
trix of co-occurrence values.
The co-occurrence matrix is constructed so that entries in
each row specify the weighted frequency of co-occurrence of
the row word and the words that preceded it in the window;
entries in each column specify the weighted frequency of co-
occurrence of the column word and the words that followed
it in the window. Words that are closer together in the mov-
ing window get larger weights. Contiguous words receive a
weight of 10; words separated by one intervening word re-
ceive a weight of 9; and so forth.
The meaning of a word is captured in the 140,000-
element vector obtained by concatenating the row and the
column vector for that word. Each vector can be thought of
as a point in a 140,000-dimensional space. The similarity in
meaning between two words is defined as the Euclidean dis-
tance between their corresponding points in the space. An
important property of HAL is that two words (e.g.,streetand
road) can have very similar meanings because they occur in
similar contexts and, hence, have similar meaning vectors,
not because they appear frequently in the same sentence
(cf. McKoon & Ratcliff, 1992).
HAL is a structural model of meaning and has no process-
ing architecture. Hence, most of the evidence on the model
consists of qualitative demonstrations or correlations be-
tween indices generated by the model and human behavior.
For example, when distances between word vectors are com-
puted and submitted to multidimensional scaling, the result-
ing scaling solutions indicate that words are grouped into
sensible categories (e.g., Burgess & Lund, 2000). Other ex-
periments have shown that interword distances computed in
HAL predict priming in lexical decision, to a reasonable ap-
proximation (e.g., Lund, Burgess, & Audet, 1996).


Latent Semantic Analysis (LSA). The overarching
goal of the LSA model (e.g., Landauer, 1998; Landauer &
Dumais, 1997; see also the chapter by Butcher & Kintsch) is
to explain Plato’s paradox: Why do people appear to know so
much more than they could have learned from the experi-
ences they have had? Like HAL, LSA is a high-dimensional
spatial model of meaning representation. Concepts in LSA
are represented by vectors in a space of approximately 300
dimensions. Similarities between meanings of concepts are
represented by cosines of angles between vectors.
The input to LSA is a matrix in which rows represent
types of events and columns represent contexts in which


instances of the events occur. In many applications, for ex-
ample, the rows correspond to word types and the columns
correspond to samples of text (e.g., paragraphs) in which in-
stances of the words appear. Each cell in the matrix contains
the number of times that a particular word type appears in a
particular context. This matrix is analyzed using singular
value decomposition (SVD), which is similar to factor analy-
sis. This analysis allows event types and contexts to be repre-
sented as points or vectors in a high-dimensional space. In
this new representation, the similarities between any pairs of
items can be computed.
In one specific implementation, samples of text were
taken from an electronic version of an encyclopedia contain-
ing 30,473 articles. From each article, a sample was taken
consisting of the first whole text or 2,000 characters, which-
ever was less. The text data were placed in a matrix of
30,473 columns, each representing a text sample, and
60,768 rows, each representing a word that had appeared in at
least two samples. The cells in the matrix contained the fre-
quency with which a word appeared in a particular sample.
After transforming the raw cell frequencies, the matrix was
submitted to SVD and the 300 most important dimensions
were retained. Thus, each word and each context could be
represented as a vector in a 300-dimensional space.
LSA has been applied to a varied set of problems. In one
application, the model’s word knowledge after training was
tested using items from the synonym portion of the Test of
English as a Foreign Language (TOEFL). Each problem con-
sisted of a target word and four answer options from which
the test taker is supposed to choose the one with the most
similar meaning to the target. The model’s choices were
determined by computing cosines between vector representa-
tions of the target words in each item and vector representa-
tions of the answer options, and choosing the option with the
largest cosine. The model performed as well as applicants to
U.S. colleges from non-English speaking countries, getting
64.4% correct.
Another application of the model simulated the acquisi-
tion of vocabulary by school-aged children. The model
gained vocabulary at about the same rate as do seventh-grade
students, approximately 10 words per day. This rate greatly
exceeds learning rates that have been obtained in experimen-
tal attempts to teach children word meanings from context.
An important finding in this analysis was that LSA’s learning
of vocabulary relies heavily on indirect learning: The
estimated direct effect of reading a sample of text (e.g., a
paragraph) on knowledge of words in the sample was an in-
crease of approximately 0.05 words of total vocabulary,
whereas the indirect effect of reading a sample of text on
wordsnotcontained in the sample was an increase of approx-
imately 0.15 words of total vocabulary. Put another way,
Free download pdf