English Language Development

This section elaborates the intended purpose of assessment. It is particularly important to refer to this section when selecting assessments other than California mandated assessments (e.g., Smarter Balanced Summative Assessments) whose technical quality are established through rigorous studies.

Elements of Technical Quality
The idea of the technical quality of assessment refers the accuracy of information yielded by
assessments and the appropriateness of the assessments for their intended purposes. There are three
important elements related to the technical quality of assessments: validity, reliability, and freedom
from bias (AERA, APA, and NCME 1999). Each element is described here, and figure 8.12, which
summarizes the key points for each, is included at the end of this section.

Validity Validity is the overarching concept that defines quality in educational measurement. It is the extent to which an assessment permits appropriate inferences about student learning and contributes to the adequacy and appropriateness of using assessment results for specific decision-making purposes (Herman, Heritage, and Goldschmidt 2011). No assessment is valid for all purposes. While people often refer to the validity of a test, it is more correct to refer to the validity of the inferences or interpretations that can be made from the results of a test. Validity is basically a matter of degree; based on its purpose, an assessment can have high, moderate or low validity. For example, a diagnostic reading test might have a high degree of validity for identifying the type of decoding problems a student is having, a moderate degree for diagnosing comprehension problems, a low degree for identifying vocabulary knowledge difficulties, and no validity for diagnosing writing conventions difficulties. Similarly, annual assessments at the end of sixth grade have a high degree of validity for assessing achievement of standards for those students but no validity for assessing the achievement of the incoming group of sixth graders. For an assessment to be valid for the intended purpose, there should be evidence that it does, in fact, assess what it purports to assess. Test publisher manuals should include information about the types of validity evidence that have been collected to support the intended uses specified for the assessment.

Reliability Reliability refers to how consistently an assessment measures what it is intended to measure (Linn and Miller 2005). If an assessment is reliable, the results should be replicable. For instance, changes in the time of administration, day and time of scoring, who scores the assessment, and the sample of assessment items should not create inconsistencies in results. Reliability is important because it is a necessary adjunct of assessment validity (Linn and Miller 2005). If assessment results are not consistent, then it is reasonable to conclude that the results do not accurately measure what the assessment is purported to measure. A general rule of thumb for reliability is that the more items on an assessment the higher the reliability. Reliability is assessed primarily with

No assessment is valid for all purposes. While people often refer to the validity of a test, it is more correct to refer to the validity of the inferences or interpretations that can be made from the results of a test.

Reliability refers to how consistently an assessment measures what it is intended to measure. If an assessment is reliable, the results should be replicable.

868 | Chapter 8 Assessment

English Language Development

Get our desktop app

Company

Features

Documentation

Resources