Statistical Analysis for Education and Psychology Researchers

(Jeff_L) #1

coefficients were based should be checked. Sample sizes of at least 100 are required for
satisfactory validity.
Another important aspect to consider whenever choosing a test is test reliability, or
the consistency of a test (see Chapter 1). Whenever one obtains educational or
psychological test scores such as aptitude, personality, intelligence or achievement
measures, these are what are called observed scores. These observed scores can be
thought of as consisting of two parts, a true score component reflecting the amount of
attribute of interest and a nuisance or error component which reflects various sources of
error, such as measurement error, transcription error, anything in fact which is not the
true score component. It can be stated that:
Observed score=true score+error score^


The reliability of an observed measurement depends upon the relative proportions of the
true score and error score components. When the error portion is large in comparison to
the true score, the reliability is low. To increase test reliability all sources of error should
be reduced. As with validity, measurement reliability is given by a reliability coefficient
which is often presented as a correlation coefficient, ‘r’. Unlike a Pearson correlation
coefficient which can range from −1 to +1, the reliability coefficient ranges from zero,
total absence of reliability, to +1, perfect reliability. An obvious question to ask, but not
easy to answer, is how large is a good reliability coefficient if the maximum is r=+1.
Instruments designed to measure attitudes and personality traits (affective type tests) tend
to have lower coefficients than measures of achievement or cognitive ability (cognitive
type tests). For affective type tests coefficients as low as r=0.7 are acceptable whereas
carefully constructed and standardized cognitive tests would be expected to have
coefficients above r=0.9.
There are three general types of test reliability measures, internal consistency,
stability and equivalence, each appropriate for different circumstances. These reliability
coefficients are summarized in Table 2.2.


Table 2.2: Reliability coefficients


Coefficient of
reliability


What is measured Comment

Internal
consistency


Extent to which all test items
measure the same construct.

Reliability of a scale can generally be increased
by increasing the number of items in a scale
and by decreasing their homogeneity, that is the
interrelatedness of items.
Split half
measured by the
Pearson
correlation ‘r’


Two scores are calculated for
each person on a test. The
Pearson correlation between
these scores on each half of the
test gives a measure of internal
consistency.

This procedure effectively reduces the number
of items on a test by 50 per cent. The
Spearman-Brown Prophecy Formulae can
correct for this (see Cronbach, 1990).

Cronbach’s
Alpha


A measure based on the ratio of
the variability of item scores to
the overall score variability.

Requires that all test items have equal
variability, that all items are equally
interrelated and that a single construct is
measured.
Kuder- A measure of the ratio of item Equivalent to Chronbach’s Alpha, and requires


Measurement issues 29
Free download pdf