Coding data
After preliminary considerations you should decide how data will be coded so that it can
be analyzed. The initial data analyses should enable obvious errors, omissions, or odd
values which may be errors or valid outlying values to be identified. Thought should be
given to the choice of the variable format. That is whether the value for a variable is
numeric or character and the number of columns that each variable occupies.
It is helpful, for each data set, to construct a data coding sheet which contains the
following summary information: name of researcher, data set name, date collected, and
total number of cases/individuals. For each variable the following information is
required:
- full variable description;
- short variable name (up to 7 characters for use in statistical programmes);
- column format for variable (number of columns needed including a column, if required,
for the decimal point); - possible variable range (minimum and maximum values);
- values for missing data (Full-stop (.) for missing numeric values and a blank for missing
character variables—these are the SAS system default values); - it may also be helpful to have ‘labels’ for nominal variables. For the variable religion,
1=Jewish; 2=Roman Catholic; 3=Church of England; for the variable sex, 0=Male and
1=Female.
If the data on children’s reasoning ability is to be analyzed using statistical programmes
such as SAS or SPSS it needs to be coded that isnumbers need to be assigned to
Example 3.1
Data collected by psychologists who were investigating the relationship between
children’s reasoning ability, age and social class (SES), is shown in Table 3.1.
Table 3.1: Data for children’s age, reasoning ability and social class
Case Age (yrs) Sex SES Raven Score (reasoning)
Henry Forbes 7 0 1 1
Joyce Bishop 9 1 2 1
Jane Hopper. 1 1 2
John Kylivee 10 0 1 9
Louise Green 7 1 1 6
Jenna Maccoby 9 1 1 3
Justin Langholm 8 0 1 1
Heather Lochlin 7 1 1 1
Sian Jones 10 1. 1
Susan Ishihara (^11121)
Initial data analysis 35