Statistical Analysis for Education and Psychology Researchers

(Jeff_L) #1

that virtually any manipulation and analysis of data is possible thereby making the SAS
system a very flexible and powerful data analysis system.
Statistical analysis of data using the SAS system usually takes place in three simple
steps: a data step, a procedure step and an output step. First you create a SAS data set
from your own raw data. This is the DATA step. You then analyse your data using any of
the appropriate statistical procedures. This is the procedure or PROC step. Finally the
results of your analysis are produced and directed to an appropriate location such as your
monitor screen or a computer file. This is the output step. Schlotzhauer and Littell (1987)
provide a straightforward guide to elementary statistical analysis using the SAS system.
Spector (1993) presents a very readable problem-based introduction to programming
using the SAS language.


SAS Data Step

Any data that is to be analyzed using SAS software, for example, the data set on
children’s reasoning ability (see Figure 3.2), has to be turned into a SAS data set so that
the SAS system recognizes it. This simple procedure is called the DATA step. It has three
parts:



  • The DATA statement which assigns a name to the SAS data set.

  • The INFILE statement which tells SAS software where the ASCII data file is located.

  • The INPUT statement which describes the data format. It declares variable names,
    assigns variables as either numeric or character and tells SAS where the variables are
    to be found (usually by column locations).


An example DATA step for the childrens reasoning ability data set, child1.dat (see
Figures 3.1 and 3.2) is shown:


data child1;
infile 'a:child1.dat';
input caseid 1–3 ageyrs 5–6 sex 8 ses 10 raven 12;

The SAS data set created is called child1. This SAS data set name is specified in the first
line of SAS code. The data on children’s reasoning ability is located in a file called
‘child1.dat’ on a disk in the directory a: (specified in the second line of SAS code). The
first variable is called caseid. It is numeric and the data values are to be found in columns
1 to 3 inclusive. If any variable was a character variable then a dollar sign ‘$’ would need
to be placed after the variable name (leave a blank space between the variable name and
the dollar sign). The second variable is called ageyrs, is numeric and the data values are
to be found in columns 5 and 6. The other variables are formatted in a similar way.


SAS Procedure Step

The statistical procedure illustrated here is PROC COMPARE. This procedure matches
variables and observations in what is called the base data set, here child1.dat, with the
same variables and observations in a comparison data set, in this example child2.dat. The
raw data was first entered using a DOS text editor into the data set child1.dat. The data


Initial data analysis 39
Free download pdf