English Language Development

(Elliott) #1
This section elaborates the intended purpose of assessment. It is particularly important to refer to
this section when selecting assessments other than California mandated assessments (e.g., Smarter
Balanced Summative Assessments) whose technical quality are established through rigorous studies.

Elements of Technical Quality
The idea of the technical quality of assessment refers the accuracy of information yielded by
assessments and the appropriateness of the assessments for their intended purposes. There are three
important elements related to the technical quality of assessments: validity, reliability, and freedom
from bias (AERA, APA, and NCME 1999). Each element is described here, and figure 8.12, which
summarizes the key points for each, is included at the end of this section.


Validity
Validity is the overarching concept that defines quality in educational measurement. It is the extent
to which an assessment permits appropriate inferences about student learning and contributes to
the adequacy and appropriateness of using assessment results for specific decision-making purposes
(Herman, Heritage, and Goldschmidt 2011). No assessment
is valid for all purposes. While people often refer to the
validity of a test, it is more correct to refer to the validity
of the inferences or interpretations that can be made
from the results of a test. Validity is basically a matter of
degree; based on its purpose, an assessment can have high,
moderate or low validity. For example, a diagnostic reading
test might have a high degree of validity for identifying the
type of decoding problems a student is having, a moderate
degree for diagnosing comprehension problems, a low
degree for identifying vocabulary knowledge difficulties, and
no validity for diagnosing writing conventions difficulties.
Similarly, annual assessments at the end of sixth grade have a high degree of validity for assessing
achievement of standards for those students but no validity for assessing the achievement of the
incoming group of sixth graders.
For an assessment to be valid for the intended purpose, there should be evidence that it does,
in fact, assess what it purports to assess. Test publisher manuals should include information about
the types of validity evidence that have been collected to support the intended uses specified for the
assessment.

Reliability
Reliability refers to how consistently an assessment measures what it is intended to measure (Linn
and Miller 2005). If an assessment is reliable, the results should be replicable. For instance, changes
in the time of administration, day and time of scoring, who
scores the assessment, and the sample of assessment items
should not create inconsistencies in results.
Reliability is important because it is a necessary adjunct
of assessment validity (Linn and Miller 2005). If assessment
results are not consistent, then it is reasonable to conclude
that the results do not accurately measure what the
assessment is purported to measure. A general rule of thumb
for reliability is that the more items on an assessment the
higher the reliability. Reliability is assessed primarily with

No assessment is valid for all
purposes. While people often
refer to the validity of a test,
it is more correct to refer to
the validity of the inferences
or interpretations that can be
made from the results of a test.

Reliability refers to how
consistently an assessment
measures what it is intended
to measure. If an assessment is
reliable, the results should be
replicable.

868 | Chapter 8 Assessment
Free download pdf