Statistical Analysis for Education and Psychology Researchers

(Jeff_L) #1
space for character values. These are the default missing data codes in SAS. Most
statistical analysis packages allow you to specify different missing value indicators for
different variables. If the data set is complete, each variable should have one of the
following; an allowable valid response, an impossible out-of-range response, or a
missing response.

Example 3.3

SAS programmes for a data listing and a frequency count, using the children’s reasoning
ability data set, child1.dat, are shown in Figures 3.6 and 3.8. Output from these two
programmes are illustrated in Figures 3.7 and 3.9. The data listing resulting from the
PROC PRINT is self explanatory (see Figure 3.7). You should compare this output with
the original ASCII data file shown in Figure 3.2. The only difference is the column
headed OBS (Observation number). SAS adds this.
Output resulting from the PROC SUMMARY, illustrated in Figure 3.9, requires
some explanation. The title is in line 0001, and in lines 0003 to 0010, for each variable,
is a listing of the number of valid cases (N), the number of cases with missing data
(Nmiss), and the minimum and maximum values. If these values are compared with
what is expected (see data coding sheet, Figure 3.1) it is evident that each variable has
10 observations, and two variables (AGEYRS, SES) have missing data. There are also
out-of-range values for the variables SES and RAVEN. The actual case(s) which have
these out-of-range values would need to be located in the data listing, Figure 3.7.
From this listing it can be seen that case 2 has an out-of-range value of 9 for the
variable SES and case 4 also has an out-of-range value of 9 for the variable RAVEN.
The two cases with missing data are case 3 (variable AGEYRS) and case 9 (variable
SES). These out-of-range values should be checked against the original data to see
whether they are transcription (copying) or recording errors. In the original data (see
Table 3.1), case number 2 had a valid response of 2 for the variable SES but this has
been transcribed wrongly when input to the data file ‘child1.dat’. In Figure 3.2 it
appears as the value 9. Whereas for case number 4, the response value of 9 for Ravens’s
score is a recording error. After close scrutiny, data should be edited if appropriate. In
the next section suggestions for dealing with missing data are given and later in this
chapter use of a check programme is illustrated which is a more systematic way of
checking for out-of-range and missing data values than using a frequency count.


Initial data analysis 45
Free download pdf