The Mismeasure of Man by Stephen Jay Gould

(nextflipdebug2) #1
THE HEREDITARIAN THEORY OF IQ 207

looked. The purpose of the tests is to tell us what we do not already know,
and it would be a mistake to test only those pupils who are recognized as
obviously below or above average. Some of the biggest surprises are
encountered in testing those who have been looked upon as close to aver-
age in ability. Universal testing is fully warranted (1923, p. 22).
The Stanford-Binet, like its parent, remained a test for individ-
uals, but it became the paradigm for virtually all the written ver-
sions that followed. By careful juggling and elimination, Terman
standardized the scale so that "average" children would score 100
at each age (mental age equal to chronological age). Terman also
evened out the variation among children by establishing a standard
deviation of 15 or 16 points at each chronological age. With its
mean of 100 and standard deviation of 15, the Stanford-Binet
became (and in many respects remains to this day) the primary
criterion forjudging a plethora of mass-marketed written tests that
followed. The invalid argument runs: we know that the Stanford-
Binet measures intelligence; therefore, any written test that corre-
lates strongly with Stanford-Binet also measures intelligence. Much
of the elaborate statistical work performed by testers during the
past fifty years provides no independent confirmation for the
proposition that tests measure intelligence, but merely establishes
correlation with a preconceived and unquestioned standard.
Tesdng soon became a multimillion-dollar industry; marketing
companies dared not take a chance with tests not proven by their
correlation with Terman's standard. The Army Alpha (see pp.
222-252) initiated mass testing, but a flood of competitors greeted
school administrators within a few years after the war's end. A
quick glance at the advertisements appended to Terman's later
book (1923) illustrates, dramatically and unintentionally, how all
Terman's cautious words about careful and lengthy assessment
(1919, p. 299, for example) could evaporate before strictures of
cost and time when his desire to test all children became a reality
(Fig. 5.3). Thirty minutes and five tests might mark a child for life,
if schools adopted the following examination, advertised in Ter-
man 1923, and constructed by a committee that included Thorn-
dike, Yerkes, and Terman himself.
This, in itself, is not finagling, but a valid statistical procedure for establishing
uniformity of average score and variance across age levels.

Free download pdf