Basic Statistics

(Barry) #1
DATAENTRY 31

so that they can determine what data actually is available on the medical records and
if it is recorded in a consistent manner. This pre-screening of the data can help avoid
problems showing up after the work of entering the data is done. Are the various
physicians who fill out the medical records doing it in a similar manner? Does the
form that the data are going to be recorded on match what is in the records? Are there
numerous missing values in the records? Have the measurements been made in the
way that the investigators expected them to be taken?
If the data are obtained from patients, are the patients actually able and willing
to respond to the questions the researchers want to ask them? Does the wording of
the questions have to be altered so that it is understandable to the patient? If a post-
operation survey is done, will the patients come into the doctor’s office to answer
questions or be examined? What can be done to increase the number willing to come
in? Can the visit include something that the patients would like to have, such as a
free checkup or some reward? If the patient will not come in, can the investigators at
least get answers to critical questions by phone?
Nonresponse has become a major problem in mail surveys. When the subject does
not respond at all, this is called unit nonresponse. Response rates can be increased
if the patients are convinced that their responses are important and they feel a sense
of obligation to the person requesting their responses. If they just refuse to respond
to one or more questions, it is called item nonresponse. For example, some people
will not respond to a question concerning income. Sometimes the item response rate
for a question can be increased if answers such as “don’t know” or “does not apply”
are included. There are statistical techniques that help reduce the biases caused by
nonresponse, but they are beyond the scope of this book (see Groves et al. [2002]).
Performing a pilot study where one goes through the entire study on a few cases
is often recommended in large studies to work out potential problems. In surveys, it
is especially important to see if people will respond to the survey and what questions
they have difficulty with.


3.2 DATA ENTRY


Once the investigators know what information they want and how they are going
to obtain it, they need to decide how they will assign names and attributes to the
data. Each type of observation should be given a name. For example, if one is
studying systolic blood pressure of males by age, one could have a small data set that
contains ID, age, systolic (an abbreviation for systolic blood pressure), gender and
smoke (smoking status). Note that abbreviated words are often used to identify the
variables.
Note that each row represents a new case and each column a different type of
observation (called variables). Here 1 = male and 2 = female. Also, 1 = current
smoker, 2 = former smoker, and 3 = nonsmoker.
Each person or item number should be given an ID number. In medical studies
names are often not used, for confidentality reasons. If the investigator wishes to
compare the results for two or more groups, a variable should be entered that represents

Free download pdf