Ordinal Logistic Regression
Ordinal logistic regression is demonstrated with the cancer dataset using theologit
command. For this analysis, the variable GRADE is the response variable. GRADE
has three levels, coded 0 for well-differentiated, 1 for moderately differentiated, and
2 for poorly differentiated.
The model is stated as follows:
ln
PðGRADEg*jXÞ
PðGRADE>g*jXÞ
¼a*g*b* 1 AGEb* 2 ESTROGEN forg*¼ 0 ; 1
This is the alternative formulation of the proportional odds model discussed in
Chap. 13. In contrast to the formulation presented in the SAS section of the appendix,
Stata, as does SPSS, models the odds that the outcome is in a category less than or
equal to category g. The other difference in the alternative formulation of the model is
that there are negative signs before the beta coefficients. These two differences
“cancel out” for the beta coefficients so thatbi¼bi however, for the intercepts,
ag¼ag*, whereagand bi, respectively, denote the intercept and ith regression
coefficient in the model run using SAS.
The code to run the proportional odds model and output follows:
ologit grade race estrogen
Ordered logit estimates Number of obs ¼ 286
LR chi2 (2) ¼ 19.71
Prob>chi2 ¼ 0.0001
Log likelihood¼287.60598 Pseudo R2 ¼ 0.0331
grade Coef. Std. Err. z P>jzj [95% Conf. Interval]
race .4269798 .2726439 1.57 0.117 .1073926 .9613521
estrogen .7763251 .2495253 3.11 0.002 1.265386 .2872644
_cut1 .5107035 .2134462 (Ancillary parameters)
_cut2 1.274351 .2272768
Comparing this output to the corresponding output in SAS shows that the coefficient
estimates are the same but the intercept estimates (labeled_cut1 and _cut2 in the Stata
output) differ, as their signs are reversed due to the different formulations of the model.
Modeling Correlated Data with Dichotomous Outcomes
Stata has a series of commands beginning with the prefixxtthat are designed for the
analysis of longitudinal studies (sometimes called panel studies) with correlated
outcomes. The first ofxtcommands that is typically used for analysis is thextset
command. This command defines the cluster variable and optionally a time variable
indicating the time the observation was made within the cluster. We demonstrate
some of thextcommands with the infant care dataset (infant.dta).
658 Appendix: Computer Programs for Logistic Regression