particular values of variables. A coding sheet to accompany the children’s reasoning
ability data set is shown in Figure 3.1.
It is preferable if each case has a unique numeric case identifier—case number, patient
reference number, or hospital number. This number may be generated by the researcher,
or existing case numbers may be used provided they are unique. If you are creating two
or more data sets for the same individuals the case identifier should be the same so they
can be easily combined if required. A unique numeric case identifier simplifies editing of
the data should any cases be identified which have odd or out of range values. SAS is
particularly flexible when it comes to importing, exporting or combining data sets.
Particular care should be taken with the coding of nominal variables and missing data.
Categorical variables which are nominal serve to label values only and are therefore
arbitrary. In this example, the variable gender is coded 0 for male and 1 for female and
socioeconomic status coded 1 for low SES, and 2 for high SES. Numeric values used in
this way act simply as labels and should not be used in any subsequent computations as
they would give nonsense results. It is suggested that missing data is coded as a period
(full stop) (.) for a numeric variables and a blank space for character variables. These are
the SAS default options. The advantage of using these default options is that no
additional definition of missing values is required (unless you choose to specify types of
missing or non-valid data). Similarly, if different categories of missing data are to be
coded, ‘don’t know’, ‘not applicable’, and ‘not valid’ then these should be assigned
numeric values which are treated as indicators and should not be used in a numeric way
in analyses (you may of course wish to count them).
RESEARCHER: Joan
Baron
(^) D/O/C: 29/11/94
DSN: ‘Child1.dat’^ NUMBER OF CASES =10
Variable Variable
Name
Format Column
Begin
Column
End
Var
Range
Code
Miss
Case id caseid 3 1 3 1–10.
Age in years ageyrs 2 5 6 5–11.
Sex sex 1 8 8 0–1.
SES ses 1 10 10 1–2.
Raven score raven 1 12 12 1–7.
Note the data set has been given a name DSN=child1.dat, the .dat extension is used
throughout this text to denote a data file.
Figure 3.1: Coding sheet for children’s reasoning
ability data set
Once data has been coded (often questionnaires are designed pre-coded so that data can
be entered directly from the questionnaire without first having to enter it onto a data
coding sheet) it is then entered into a computer data file. There are two options for this
stage of data entry:
Statistical analysis for education and psychology researchers 36