- Use of a dedicated (belonging to a particular statistical software programme) data entry
programme to create a specially formatted data file, for example, use of
SAS/INSIGHT programme editor or SPSS data editor. - Use of a DOS Text editor (or similar editor) to produce an ASCII text file containing all
the data.
The second option is illustrated here because it is of more general utility. The advantage
with creating an ASCII text file for your data is that you produce an exportable data file
that can be read by most statistical programmes and spreadsheets. Some statistical
analysis packages have their own dedicated data entry programmes. These are useful if
you intend using just that particular software package (although some do have facilities
for producing ASCII text data files). A specially formatted data file produced by a
dedicated data entry programme can only be used by that particular software package. An
SPSS data file cannot be read directly by SAS or ML3E statistical programmes. In
addition, use of dedicated data entry programmes requires you to have to learn another
set of data entry instructions.
Saving Data in a Computer Data File
Data in a computer data file are usually arranged in a matrix consisting of rows and
columns. For each subject or case, there is one row or line of data (it is possible to have
more then one row of data per case). The columns of data represent variable(s). Usually
there is more than one variable, in which case it may be helpful to separate different
variables by a blank column. This format facilitates checking of the data file. When there
are many variables it is better to omit the blank space because more variables can then be
fitted onto one line. However, it is not a problem if there are more variables than there are
spaces on a line. The recommended maximum is 72 columns of data per line. This
suggested restriction is so that a whole row of data can be seen on a computer screen at
once. Individual cases need not be limited to the 72 columns. If a case consisted of 130
columns of data the first 72 characters would occupy the first 72 columns on line one of a
data file and the remainder of 58 characters would occupy the first 58 columns on the
second line of the data file. Similarly, the second case would occupy lines three and four,
in this example there would be two lines of data per case. Appropriate data format
statements could be given to ensure that SAS or SPSS reads two lines of data per case.
Example 3.2
An ASCII data file for the children’s reasoning ability data set, DSN=child1.dat, is
shown in Figure 3.2. This data was entered with a DOS text editor. Any text editor that
can produce an ASCII text file would however be suitable. (Beware, some text editors
add control characters to the end of a file and this can cause problems when the text data
file is read by your statistical analysis programme).
We can see that there are 10 rows in the data file child1.dat, one row per case, hence
10 cases are represented. For each case there are 5 variables, each variable is
separated by a blank column The first variable here‘caseid’ always occupies columns
Initial data analysis 37