1–3. A blank column separates this variable from the second variable, ‘ageyrs’ which
always occupies columns 5–6. The variable ‘sex’ occupies column 8, the variable ‘ses’
occupies column 10, and the variable ‘raven’ occupies column 12. If we count ‘caseid’ as
the first variable, then the first case has a value of 001 for the first variable, a value of 07
for the second variable, 0 for the third, 1 for the fourth and 1 for the fifth.
00107011
00209191
003. 1 1 2
00410019
00507116
00609113
00708011
00807111
009 10 1. 1
01011121
Figure 3.2: Example of a data file, data set
child1.dat
The illustrated data set in Figure 3.2 is an example where a fixed format data entry
procedure is used, that is, a fixed number of columns specified for each and every
variable. This gives a fixed number of columns in total per case. Variable formats are
possible but tend to be problematic when data sets are combined. Generally it is advisable
to use a fixed format.
Data Verification
After data has been coded, typed into a suitable editing or data entry package and saved
as a data file the next step is data verification. This means the data input procedure is
checked for transcription errors. If possible, the data should be re-entered and any
differences between the two versions of the data set identified and checked against the
original data. A convenient procedure in SAS is PROC COMPARE which compares the
values of variables in two data sets and can provide information on differences found for
each observation and the number of variables in both data sets that were found to have
unequal values.
The SAS System
In the following section the basic structure of the SAS system (Statistical analysis
system) is introduced and the use of the SAS procedure PROC COMPARE to verify a
data set is illustrated for the PC version of SAS.
The SAS system, which is available on mainframe and personal computers, is a
software system for the modification and statistical analysis of data. A great advantage of
the SAS system is its dual function of offering both extensive ‘off the shelf’ statistical
procedures and a high-level programming language capability. This later facility means
Statistical analysis for education and psychology researchers 38