along with the type and number of responses to be
considered. For example, it might be decided that
every physical movement or gesture will be
recorded for 1 minute every 4 minutes. The total
observational time might consist of a 20-minute
recess period for kindergarten children. This
means that every 4 minutes the child would be
observed for 1 minute and all physical movements
recorded. These movements would then be coded
or rated for the variable under study (e.g., aggres-
sion, problem solving, or dependence).
In addition to the units of analysis chosen, the
specific form that the ratings will take must also be
decided. One could decide to record behaviors
along a dimension ofintensity: How strong was
the aggressive behavior? One might also include a
durationrecord: How long did the behavior last? Or
one might use a simplefrequencycount: How many
times in a designated period did the behavior under
study occur?
Beyond this, a scoring procedure must be
developed. Such procedures can range from making
checkmarks on a sheet of paper attached to a clip-
board to the use of counters, stopwatches, timers,
and even laptop computers. All raters, of course,
will employ the same procedure.
Observer Error. No one is perfect. Observers
must be monitored from time to time to ensure
the accuracy of their reports. Sometimes they sim-
ply miss things or else believe they have observed
things that never really happened. A child’s yell
may be accidentally attributed to the wrong child.
Or perhaps the yell is coded as verbal aggression
when actually it represented a kind of camaraderie.
In other cases, it may not be error so much as bias.
For some reason, an observer may not like a person
in a family being observed. The observer may then
be prone to provide a less than flattering rating for
that person whenever the opportunity arises. It
sometimes happens that a person being observed
does something early in the observational
sequence—solving a problem quickly, perhaps, or
making an intelligent remark—that the observer
reacts to. The result is a kind of halo effect so that
later the observer is more likely to assign favorable
ratings to that person. Whatever the nature of the
potential bias or error, it is important to hold careful
training sessions for observers in advance, along
with periodic review sessions, to help keep these
sources of trouble in check.
Reactivity. Another factor affecting the validity
of observations isreactivity. Patients or study partici-
pants sometimes react to the fact that they are being
observed by changing the way they behave. The
talkative person suddenly becomes quiet. The com-
plaining spouse suddenly becomes the epitome of
self-sacrifice. Sometimes an individual may even
feel the need to apologize for the dog by saying,
“He never does that when he is alone with us.”In
any case, reactivity can severely hamper the validity
of observations because it makes the observed
behavior unrepresentative of what normally occurs.
The real danger of reactivity is that the observer
may not recognize its presence. If observed behav-
ior is not a true sample, this affects the extent to
which one can generalize from this instance of
behavior. Then, too, observers may unwittingly
interfere with or influence the very behavior they
are sent to observe. In the case of sexual dysfunc-
tion, for example, Conte (1986) has noted that
behavioral ratings are so intrusive that clinicians
usually have to rely on self-report methods. How-
ever, even self-monitoring may be subject to reac-
tivity as well (Korotitsch & Nelson-Gray, 1999).
Ecological Validity. One of the biggest problems
in psychology (and one that has never been fully
resolved) is what Brunswik (1947) referred to many
years ago asecological validity. The basic question is
whether or not clinicians do obtain really representa-
tive samples of behavior. Is the client’sbehaviortoday
typical, or is it the product of some uncharacteristic
stimulus? Such a question goes beyond simple reac-
tivity. It asks whether or not observers have a large
enough sample to assure that their observations will
be truly representative. It is doubtful that any one
piece of behavior will be typical of a given client.
This being so, is the sample of 4 or 10 or 20 instances
the proper number? All areas of psychology have
long grappled with this problem. Is one experiment
BEHAVIORAL ASSESSMENT 271