Handbook of Psychology, Volume 4: Experimental Psychology

452 Semantic Memory and Priming

text. HAL’s vocabulary consists of the 70,000 most frequently
used symbols in the corpus. About half of these symbols have
entries in the standard Unix dictionary; the remainder in-
cludes nonwords, misspellings, proper names, and slang. For
ease of exposition, we refer to the 70,000 symbols as words.
Themethodologythereforeproducesa70,000×70,000ma-
trix of co-occurrence values.
The co-occurrence matrix is constructed so that entries in
each row specify the weighted frequency of co-occurrence of
the row word and the words that preceded it in the window;
entries in each column specify the weighted frequency of co-
occurrence of the column word and the words that followed
it in the window. Words that are closer together in the mov-
ing window get larger weights. Contiguous words receive a
weight of 10; words separated by one intervening word re-
ceive a weight of 9; and so forth.
The meaning of a word is captured in the 140,000-
element vector obtained by concatenating the row and the
column vector for that word. Each vector can be thought of
as a point in a 140,000-dimensional space. The similarity in
meaning between two words is defined as the Euclidean dis-
tance between their corresponding points in the space. An
important property of HAL is that two words (e.g.,streetand
road) can have very similar meanings because they occur in
similar contexts and, hence, have similar meaning vectors,
not because they appear frequently in the same sentence
(cf. McKoon & Ratcliff, 1992).
HAL is a structural model of meaning and has no process-
ing architecture. Hence, most of the evidence on the model
consists of qualitative demonstrations or correlations be-
tween indices generated by the model and human behavior.
For example, when distances between word vectors are com-
puted and submitted to multidimensional scaling, the result-
ing scaling solutions indicate that words are grouped into
sensible categories (e.g., Burgess & Lund, 2000). Other ex-
periments have shown that interword distances computed in
HAL predict priming in lexical decision, to a reasonable ap-
proximation (e.g., Lund, Burgess, & Audet, 1996).

Latent Semantic Analysis (LSA). The overarching
goal of the LSA model (e.g., Landauer, 1998; Landauer &
Dumais, 1997; see also the chapter by Butcher & Kintsch) is
to explain Plato’s paradox: Why do people appear to know so
much more than they could have learned from the experi-
ences they have had? Like HAL, LSA is a high-dimensional
spatial model of meaning representation. Concepts in LSA
are represented by vectors in a space of approximately 300
dimensions. Similarities between meanings of concepts are
represented by cosines of angles between vectors.
The input to LSA is a matrix in which rows represent
types of events and columns represent contexts in which

instances of the events occur. In many applications, for example, the rows correspond to word types and the columns correspond to samples of text (e.g., paragraphs) in which instances of the words appear. Each cell in the matrix contains the number of times that a particular word type appears in a particular context. This matrix is analyzed using singular value decomposition (SVD), which is similar to factor analysis. This analysis allows event types and contexts to be represented as points or vectors in a high-dimensional space. In this new representation, the similarities between any pairs of items can be computed. In one specific implementation, samples of text were taken from an electronic version of an encyclopedia contain- ing 30,473 articles. From each article, a sample was taken consisting of the first whole text or 2,000 characters, which- ever was less. The text data were placed in a matrix of 30,473 columns, each representing a text sample, and 60,768 rows, each representing a word that had appeared in at least two samples. The cells in the matrix contained the frequency with which a word appeared in a particular sample. After transforming the raw cell frequencies, the matrix was submitted to SVD and the 300 most important dimensions were retained. Thus, each word and each context could be represented as a vector in a 300-dimensional space. LSA has been applied to a varied set of problems. In one application, the model’s word knowledge after training was tested using items from the synonym portion of the Test of English as a Foreign Language (TOEFL). Each problem con- sisted of a target word and four answer options from which the test taker is supposed to choose the one with the most similar meaning to the target. The model’s choices were determined by computing cosines between vector representa- tions of the target words in each item and vector representa- tions of the answer options, and choosing the option with the largest cosine. The model performed as well as applicants to U.S. colleges from non-English speaking countries, getting 64.4% correct. Another application of the model simulated the acquisi- tion of vocabulary by school-aged children. The model gained vocabulary at about the same rate as do seventh-grade students, approximately 10 words per day. This rate greatly exceeds learning rates that have been obtained in experimental attempts to teach children word meanings from context. An important finding in this analysis was that LSA’s learning of vocabulary relies heavily on indirect learning: The estimated direct effect of reading a sample of text (e.g., a paragraph) on knowledge of words in the sample was an increase of approximately 0.05 words of total vocabulary, whereas the indirect effect of reading a sample of text on wordsnotcontained in the sample was an increase of approximately 0.15 words of total vocabulary. Put another way,

Handbook of Psychology, Volume 4: Experimental Psychology

Get our desktop app

Company

Features

Documentation

Resources