Basic Statistics

(Barry) #1

32 COLLECTING AND ENTERING DATA


group identity. In this example, one might want to compare the results for males and
females, so gender needs to be included.


Table 3.1 Study Results

ID Age Systolic Gender Smoke
1 57 123 1 1
2 71 137 2 2
3 35 128 1 3
4 60 155 2 1

The data may be taken by direct observation, by interviews, or from records.
Taking direct observations or interviews takes longer and the observers or interviewers
may need to be trained and monitored occasionally to ensure accurate information.
Obtaining data from records is less expensive and can be a cut-and-paste type of
procedure. But sometimes the investigator has little control over how or what is
taken. The data may not have been taken for research purposes. In other cases, when
the data are obscure, it may help to have two investigators read it separately, compare
their results, and then reach an agreement on the results that they originally did not
agree on.
Before entering the data into a statistical package, the investigators have to know
how they will name each variable, what type of variable it is, the placement of the
decimal point, how missing values are to be identified, and so on. The identification of
missing values depends on which statistical package one intends to use, so reading the
data entry instructions in the manual or using the HELP statement may be necessary.
For example, for Minitab the DATA window will display a spreadsheet form so
the user can enter the data directly. If the value for a variable is missing for a case,
an asterisk is entered instead of the value for that case and variable to identify the
result as missing. In SAS, either the space where the answer goes can be left blank
or a period can be put there. SPSS allows the reader to designate what they want as
a missing value. Stata uses periods.
Another common method of entering data is to use an EXCEL spreadsheet. Most
statistical packages will allow importing EXCEL files. For more details on the data
entry for the commonly used programs, see Afifi et al. [2004]. The statistical programs
all provide information on entering data in their manuals.
The data that are entered may be more complicated in surveys. Suppose that the
questionnaire includes the question: Are you a current smoker? The answer could
be either yes or no. For those respondents who are current smokers, there could be a
series of questions concerning how much and for how long. For the nonsmokers these
questions will be skipped and the respondent will then be asked a question on the
next topic that does not pertain to smoking. SPSS, which was developed originally
with surveys in mind, has a skip and fill option that handles this type of situation.
Another possible type of entry is a form entry. Here an entry form similar to the
actual form that is used to collect the data is used to enter the results (see Afifi et
al. [2004] for a discussion of this method).

Free download pdf