Statistical Methods for Psychology

(Michael S) #1
the assumptions of normality and homogeneity of variance in arrays. Rather than classify-
ing people as improved or not improved, suppose that we could somehow measure their
disease outcomes more precisely. (For example, we could rate their condition on a 100
point scale.) Then for a rating of SurvRate 5 20, for example, we would have a whole dis-
tribution of disease outcome scores; similarly for people with SurvRate 5 30, SurvRate 5
40, etc. These distributions are shown schematically in Figure 15.10.
When we class someone as improved, we are simply saying that their disease outcome
score is sufficiently high for us to say that they fall in that category. They may be completely
cured, they may be doing quite a bit better, or they may be only slightly improved, but they at
least meet our criterion of “improved.” Similarly, someone else may have remained constant,
gotten slightly worse, or died, but in any event their outcome was below our decision point.
What we have here are called censored data.When I speak of censoring I’m not
talking about some nasty little man with a big black marker who blocks out things he
doesn’t want others to see. We are talking about a situation where something that is
above a cutoff is classed as a success, and something below the cutoff is classed as a fail-
ure. It could be performance on a test, obtaining a qualifying time for the Boston
Marathon, or classifying an airline flight as “on time” or “late.” From this point of view,
logistic regression can be thought of as applying linear regression to censored data.

564 Chapter 15 Multiple Regression


Disease outcome

Cutoff score
for improved

Line through means

Figure 15.10 Disease outcome as a function of SurvRate

censored data


0 10 20 30 40 50 60 70 80 90 100
SurvRate

NewOut by SurvRate

NewOut

1.1
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0.1

Figure 15.9 More appropriate regression line for predicting outcome
Free download pdf