Evaluating Intellectual Ability 281
Stanford University and was published in 1916 as the Stanford
Revision and Extension of the Binet-Simon Intelligence
Scale, soon to become known as the Stanford-Binet (Terman,
1916). Subsequent modifications and restandardization over
the years produced several further versions of this measure,
the most recent of which was published as the Fourth Edition
Stanford-Binet in 1986 (Thorndike, Hagen, & Sattler, 1986).
Central to the conceptual basis and empirical standardiza-
tion of the Stanford-Binet is a focus on normative age-related
expectations for performance on its component tasks, which
makes it possible to translate successes and failures on these
tasks into a mental-age equivalent. While Terman was collect-
ing his standardization data, William Stern (1871–1938) ad-
vanced the notion that a “mental quotient” could be calculated
for respondents by dividing their chronological age by their
mental age and multiplying the result by 100 (Stern, 1914).
Terman endorsed this notion and included Stern’s calculation
in the 1916 Stanford-Binet. However, he decided to rename this
number an “intelligence quotient,” introducing the termIQinto
the language of psychology and into vocabularies worldwide.
Group-Administered Tests
Just one year after publication of the Stanford-Binet, public
duty once more shaped the development of intelligence test-
ing. The entry of the United States into World War I in 1917
generated a pressing need to draft and train a large number of
young men who could quickly be transformed from city boys
and farm boys into the “doughboys” who served in the
trenches. It would facilitate this process to have a measure of
intelligence that could be administered to large numbers
of recruits at a single sitting and help screen out those whose
intellectual limitations would prevent them from functioning
competently in the military, while also identifying those with
above average abilities who could be trained for positions of
responsibility. Robert Yerkes (1877–1956), then president of
the American Psychological Association, responded to the
war effort by chairing a Committee on the Psychological
Examination of Recruits, on which Terman was asked to
serve. Coincidentally, one of Terman’s graduate students,
Arthur Otis (1886–1963), had been working to develop a
group intelligence test. Otis shared his work with Yerkes’
committee, which drew heavily on it to produce what came to
be known as the Army Alpha test.The Army Alpha test was
the first group-administered intelligence test and, as noted by
Haney (1981), it was constructed quickly enough to be given
to almost two million recruits by war’s end.
As a language-based instrument that required respondents
to read instructions, however, the Army Alpha was not
suitable for assessing recruits who were illiterate or, being re-
cent immigrants to the United States, had little command of
English. This limitation of the Army Alpha led to creation of
the Army Beta, which was based on testing procedures previ-
ously developed for use with deaf persons and consisted of
nonverbal tasks that could be administered through pan-
tomime instructions, without use of language. The Army
Beta’s attention to groups with special needs foreshadowed
later attention to culture-related sources of bias in psycholog-
ical assessment and to the importance of multicultural sensi-
tivity in developing and using tests (see Dana, 2000; Suzuki,
Ponterotto, & Meller, 2000). Following the war, group testing
of intelligence continued in the form of several different mea-
sures adapted for civilian use, one of the first, fittingly
enough, was the Otis Classification Test (Otis, 1923).
The Wechsler Scales
The Stanford-Binet was the first systematically formulated
and standardized measure of intelligence, and for many years
it was by far the most commonly used method of evaluating
intelligence in young people and adults as well. The kinds of
tasks designed by Binet have continued to the present day to
provide the foundation on which most other tests of intelli-
gence have been based. Beginning in the late 1930s, how-
ever, a new thread in the history of intelligence testing was
woven by David Wechsler (1896–1981), then chief psychol-
ogist at Bellevue Hospital in New York City. Wechsler saw
shortcomings in defining intelligence by the ratio of mental
age to chronological age, especially in the evaluation of
adults, and he developed instead a method of determining IQ
on the basis of comparing test scores with the normative dis-
tribution of these scores among people in various age groups.
The instrument he constructed borrowed subtests from the
Stanford-Binet, the Army Alpha and Beta, and some other ex-
isting scales, and thus it was not new in substance. What was
new was the statistical formulation of IQ as having a mean of
100 and a standard deviation of 15, which in turn led to the
widely accepted convention of translating IQ scores into
percentile ranks.
Also innovative was Wechsler’s belief that intellectual ca-
pacities constitute an integral feature of personality function-
ing, from which it followed that a well-designed intelligence
test could provide useful information beyond the implications
of an overall IQ score. Wechsler postulated that the pattern of
relative strengths and weaknesses across subtests measuring
different kinds of mental abilities could be used to identify
normal and abnormal variations in numerous cognitive char-
acteristics and coping capacities. Published as the Wechsler-
Bellevue, Wechsler’s (1939) test gradually replaced the
Stanford-Binet as the most widely used measure of adult