Statistical Analysis for Education and Psychology Researchers

(Jeff_L) #1

Richardson K-
R 20


variability to total score
variability. Equivalent to
Cronbach’s Alpha but modified
for dichotomous scoring.

the same assumptions. Used when items are
scored dichotomously, i.e., 1 or 0. The K-R 21
is a simplified version of the K-R 20 formulae
but makes the additional assumption that all
items are of equal difficulty.
Stability The correlation between
measurements taken on two
occasions.


Not useful for attributes that are unstable, e.g.,
measures such as anxiety.

Coefficient of
reliability

What is measured Comment

Test-retest measured by
an appropriate correlation
coefficient, e.g., Pearson
‘r’or rank correlation.


Correlation between scores on the same
test administered on two separate
occasions. If the time interval between
testing is no longer than 2 weeks this is
often called coefficient of
dependability (see Cattell et al., 1970).

Always tends to overestimate
test reliability.

Equivalence The correlation between scores on
parallel tests.
Alternate form measured
by an appropriate
correlation.


Correlation between two parallel forms
of the same test (different items in each
test) given on two separate occasions.
May be called coefficient of
equivalence.

Parallel forms of a test
eliminate possible memory
effects and also have the
advantage of covering an
entire domain of interest
because more items are used.

Inter-rater measured by
suitable statistics: per cent
agreement, Kendall’s
coefficients of
concordance W, and
Kappa K.


The equivalence of independent
observer rates judgments of an attribute
or behaviour.

The effect of agreements by
chance can be corrected by use
of the Scott (1955) coefficient.

The Standard Error of Measurement of a Test

The idea of sampling error or sampling variability was introduced in Chapter 1. We now
consider extending this idea to look at the standard error of any set of measurements.
Expressed simply, the variability of a set of measures is called its standard error of
measurement (s.e.m.), and represents an index of how widely measures vary. The larger
the variability, the larger the s.e.m., the less accurate is the measure. The idea of standard
error is a very important statistical concept which will appear on many occasions
throughout this book. This concept of standard error applies to sample statistics, as well
as to scale measures and test scores. For example, a sample average has a standard error,
that is the standard error of the mean. This idea was presented in Chapter 1 where it
was called the sampling error (variability) of a sample average.
To review, suppose there was a large population of school children—over 1000—and
a random sample of 20 children was taken, it would be possible to calculate the average
or mean reading score for this sample of 20 children. Call this figure mean 1. We could
continue taking random samples of 20 children calculating the mean of each sample until
there were 100 sample means. There would now be, mean(1)...mean(100), and the mean of
these 100 sample means would be a reasonable estimate of the population mean reading


Statistical analysis for education and psychology researchers 30
Free download pdf