Logistic Regression: A Self-learning Text, Third Edition (Statistics in the Health Sciences)

(vip2019) #1
EXAMPLE
e.g.,q¼3,n 3 ¼4,^PðXiÞ: 0.30, 0.35,
0.40, 0.45
Ec 3 ¼~

n 3
i¼ 1

^PðXi 3 Þ¼ 0 : 30 þ 0 : 35 þ 0 : 40

þ 0 : 45 ¼ 1 : 50
andEnc3¼n 3 Ec3¼ 4 1.50
¼2.50

Step 5:


HL¼~

Q
q¼ 1

ðOcqEcqÞ^2
Ecq
þ~

Q
q¼ 1

ðOncqEncqÞ^2
Encq

Q¼ 10 ) 20 values in summation


Step 6:


HL ~ approx c^2 Q–2 under H 0 : Good fit


i.e., not enough
evidence to
indicate lack of fit

Q = 10 ⇒ df = Q – 2 = 8

V. Examples of the HL
Statistic


EXAMPLE
Evans County Data (n¼609)
(see previous chapters and
Computer Appendix)

Model EC1(no interaction):
logit P(X)¼aþbCATþg 1 AGEG
þg 2 ECG

Model EC2(fully parameterized):
logit PðXÞ¼aþbCATþg 1 AGEG
þg 2 ECG
þg 1 AGEGECG
þd 1 CATAGE
þd 2 CATECG
þd 3 CATAGEECG

For example, if the third decile contains four
subjects with predicted risks of 0.30, 0.35, 0.40,
and 0.45, then the expected number of cases
(Ec3) would be their sum 0.30þ0.35þ0.40þ
0.45¼1.50 (regardless of whether or not a
subject is an observed case). The expected
noncases in the same decile (Enc3) would be
4 1.50¼2.50.

In step 5, the HL statistic is calculated using
the formula at the left. This formula involves
summingQvalues of the general form (Oq
Eq)^2 /Eqfor cases and anotherQvalues for non-
cases. WhenQ¼10, the HL statistic therefore
involves 20 values in the summation.

In step 6, the HL statistic is tested for signifi-
cance by comparing the computed HL value to
a percentage point ofw^2 withQ2 degrees of
freedom. WhenQ¼10, therefore, the HL sta-
tistic is approximatelyw^2 with 8 df.

We now illustrate the use of the HL statistic
with the Evans County data (n¼609). This
dataset has been considered in previous chap-
ters, and is described in detail in the Computer
Appendix. SASs Logistic procedure was used
for the computations.

In our first illustration, we fit the two models
shown at the left. The outcome variable is CHD
status (1¼case, 0¼noncase), and there are
three basic (i.e., main effect) binary predictors,
CAT (1¼high, 0¼low), AGEG (1¼age55,
0 ¼age>55), and ECG (1¼abnormal, 0¼
normal). Recall that the Evan County dataset
is described in the Computer Appendix.

320 9. Assessing Goodness of Fit for Logistic Regression

Free download pdf