statistical indices. Publishers’ manuals should provide information about the reliability evidence for an
assessment and the relevant statistical indices.
A variety of factors can influence the reliability of an assessment. For example, if a test is
administered in an extremely hot or noisy room, students may not be able to complete the test to the
best of their ability. If students are asked to provide an
oral presentation when the instructions or expectations
have not been made clear, this affects the reliability
of the performance assessment. A number of other
factors, including students’ health, level of stress, and
motivation can affect the reliability of an assessment.
Teachers should use their judgment in interpreting
assessment results when they suspect students are not
able to perform to the best of their abilities. It is equally
important for teachers to understand that a test or
performance assessment may be reliable but not valid.
For example, a student may consistently do well on an
assessment, but the assessment may not be measuring
what it claims to measure.
Freedom from Bias
Bias can occur in test design or the way results are interpreted and used. Bias systematically
disadvantages a student or group of students so that the students are unable to accurately show what
they know and can do with respect to the content of the assessment. As a result, the assessment
results may underestimate the students’ achievement or reflect abilities that are not related to the
assessment’s content (Abedi and Lord 2001). Bias arises from tests that favor students of a particular
gender, ethnicity, cultural background, geographic location, disability, or primary language. An
assessment that is free from bias produces the same scores for students of the same attainment level,
irrespective of their demographic subgroup.
Popham (1995) identifies two forms of bias: offensiveness and unfair penalization. Offensiveness
occurs when the content of an assessment offends, upsets, or distresses particular subgroups, thus
negatively influencing the test performance of these students. Items that present stereotypes of girls,
boys, or particular cultures, or that portray certain groups as inferior, could adversely affect certain
students’ performance.
Unfair penalization occurs when the test content makes the test more difficult for some students
than for others. Bias may occur, for example, if a test includes vocabulary that is unfamiliar to students
because of their culture or geographic location. Bias may also occur if the test contains images that
are more familiar to one group than another, or demands language skills beyond those of the targeted
students. For example, if a reading assessment contains vocabulary related to rural life, then inner
city students are potentially more disadvantaged than rural students. In addition, bias occurs when
assessments that are based on letter-sound principles are used with students who do not have access
to the sounds of language (i.e., students who are deaf or hard of hearing).
Assessment developers typically go to great lengths to make sure assessment items are not
biased. Examine the publisher’s manual for evidence that item reviews to guard against bias have
been conducted.
Validity, reliability, and freedom from bias are all necessary conditions for assessment. They are
not interchangeable (Linn and Miller 2005). For example, an assessment may offer consistent results
(high reliability) without measuring what was targeted (low validity); and conversely a measurement
with all the hallmarks of validity may not have high reliability. The key points of technical quality are
summarized in figure 8.12.
Bias arises from tests that favor
students of a particular gender,
ethnicity, cultural background,
geographic location, disability, or
primary language. An assessment
that is free from bias produces the
same scores for students of the
same attainment level, irrespective
of their demographic subgroup.
Assessment Chapter 8 | 869