Statistical Analysis for Education and Psychology Researchers

(Jeff_L) #1
Task

The initial task facing an investigator would be to perform an initial data analysis.
The first step is to scrutinize the data, assess its structure and to reflect on the data
collection procedure. You should then summarize the data in a form suitable for
presentation in a journal article and comment on any special features of the data. This
task is intended to be carried out using a computer.
The first part of this task has been completed and is intended to illustrate the use of the
COMPARE programme.


Data processing and cleaning

Looking at the data as presented, it is a little messy. See Table 2, Appendix A1. This data
needs ‘cleaning-up’. Notice missing data has been coded as character (.). Missing data
and spurious values can be checked by using PROC SUMMARY to produce a frequency
count; an example of the output from this procedure is shown in Figure 3.9. The
procedure PROC SUMMARY enables any numeric variables with missing or out-of-
range values to be identified. A listing of the data is then used to find those case numbers
(caseid) corresponding to the variables with spurious values. Once a particular caseid is
identified this can be used to check data values on the data listing against original data.
This process of identifying out-of-range values and corresponding caseids can be
considerably speeded-up and completed in one step using the SAS programme Check.job
given in Figure 1, Appendix A3. This programme produces, for all numeric variables
(except caseid), a listing of caseids which have out-of-range values and missing data. It
prints the caseids against each corresponding variable with missing or spurious data.
Once caseids with missing or out-of-range values are identified, see Figure 3.18, these
values can be checked against original data.
Case identifiers with out-of-range values or missing data for each numeric variable
variable name max
ATTB1. 202.
ATTB2 37..
151..


. 188.
. 202.
. 221.
ATTB3.. 82
134..
.. 157
. 194.
. 202.
. 225.
variable name. max
ATTB4 118..
.. 151
. 176.


Initial data analysis 81
Free download pdf