Statistical Analysis for Education and Psychology Researchers

(Jeff_L) #1

study. More recently this path has been followed by the national curriculum development
and testing programme in the United Kingdom.
The other major category of tests based in the psychometric or psychological
measurement tradition are norm referenced tests. The emphasis in this type of test is on
relativity of an individual’s overall score. An individual’s score is interpreted relative to
those of other individuals. A normative test score provides an indication of an
individual’s standing relative to other individuals, for example an individual’s IQ
(intelligence quotient) score. Such a score would be interpreted by comparing it with
those of a representative sample of individuals. The researcher has to decide what is
meant by representative. Usually this refers to individuals of similar age, gender and
perhaps ethnicity. Representative sets of scores for defined groups of individuals, for
example, age group 6–7 years, are usually presented in a test standardization manual and
are referred to as tables of norms. Many norm tables present information about the
standard reference group as the percentage of individuals in the reference group who
score lower than the particular individual’s test score to which reference is being made.
Standard reference scores presented in this way are called percentile norms.
As with a standardized test score from a normative test, a single attainment mark from
an educational test seldom has an absolute meaning. The significance of a datum point
(single mark) can best be interpreted in the context of other marks or scores and in that
sense is relative. These other marks may represent other pupils’ scores on the same test or
an average normative reference score for a particular age group. A single datum point or
achievement score for a pupil acquires meaning only when it is interpreted together with
other data such as achievement scores obtained by individuals on the same test or when
compared with achievement norms. To help assess the relative importance of a particular
datum point or score one can examine a data set graphically, paying particular attention
to both the spread of scores and summary measures of central tendency such as a mean or
median. Measures of spread or dispersion and central tendency are examples of
descriptive statistics and are helpful when summarizing a distribution of scores. Any
individual score can then be compared with the average for that score.


Choosing a Standardized Test

Most of the tests referred to so far are existing tests which have been carefully developed
and evaluated—what are called ‘off the shelf’ tests. Helpful sources of information about
these tests are the Buros Mental Measurement Yearbooks produced about every five
years and test publishers’ catalogues. Standardized, ‘off the shelf’ tests are published with
norm tables and validity and reliability coefficients. Choice of a test should relate to the
variables and underlying constructs one wishes to measure. One should consider both the
characteristics of the test, the characteristics of the testees and how the test information
will be used—to show change, for selection or prediction. For whatever purpose a test is
selected, it should be valid and reliable. If a normative test is used it should have
adequate norms, and if a criterion test is used the test content should be relevant to the
purpose of testing, for example, in the case of an achievement test, it should be relevant
to learning objectives. A good test manual should provide most of this psychometric
information. A straightforward guide to selecting the best test is given by Kline (1990)


Measurement issues 27
Free download pdf