Statistical Analysis for Education and Psychology Researchers

(Jeff_L) #1

5 Rather sophisticated and generally not necessary. It makes use of the iterative two-step,
expectation, maximization (E-M) algorithm to derive maximum likelihood estimates
for incomplete values (see Little and Rubin, 1987, for details and limitations).


Spurious values, that is values that are extreme but plausible i.e. within the allowable
range are more problematic than gross errors and need to be checked very carefully.
Extreme data values which are possible but not consistent with the remaining data are
called outliers. An outlier may be an error or a valid and influential observation. For
example, caseid 5 in the child1 data set on children’s reasoning ability has an outlier
observation for the variable Raven (see Figure 3.7). The value of this observation is 6
which is within range (1–7) but very different to the other data (twice as large as the next
nearest valid value). You may think that Caseid 4 has an outlier observation for the same
variable, here the Raven score is 9. However, this is an out-of-range value.
Both of these examples pose a dilemma, What should be done?
If the raw data had been checked and recording, transcription and typing errors had
been eliminated, the Raven value of 9 could be coded as a missing observation. However,
the Raven value of 6 is within range and, provided similar editing and transcription
checks had been made, I would suggest repeating any analyses with and without this
value in the data set. Provided interpretation of the findings were not radically different
for both analyses then it is not crucial whether or not the value is counted as valid or is
treated as missing. If, however, different conclusions are drawn depending upon whether
or not this variable is counted as valid, then this is an example of an influential
observation. You should interpret such data with care as the influential observations may
represent a different population to the majority of observations.
It is of paramount importance to begin your data analysis with data whose structure
you know and understand and in which you have confidence. The trials and tribulations
of collecting, coding and entering data can only be appreciated by experience.
These simple steps of data processing and data cleaning are an essential prerequisite to
data description and subsequent analysis. Despite this, it is a neglected topic and taken for
granted in most statistical texts. Any data errors attributable to processing errors or
recorded out-of-range values would render subsequent analysis invalid. Gross data errors
are likely to be identified at the early editing stage. It is possible, however, that some
errors will remain undetected. It is important therefore to build into subsequent analyses
diagnostic procedures to guard against erroneous data points unduly influencing your
analysis.


3.3 Describing Distributions

After data processing, editing and cleaning, the final stages of IDA are data description
and formulation of an underlying statistical model for the data. The main purpose of data
description is to present essential and important features of the data usually in tables,
graphs and charts. Space is limited in journal articles and final reports and besides many
readers find it difficult to process large amounts of detailed data. Usually required are a
small number of tables which convey a concise summary of the important aspects of the
data and perhaps a graph or chart to convey information visually.


Initial data analysis 49
Free download pdf