The variable in the infant dataset defining the cluster (infant) is IDNO. The variable
defining the order of measurement within a cluster is MONTH. These variables can be
set and then used in subsequent analyses with thextsetcommand. The code follows:
xtset idno month
Now when otherxt commands are run using this dataset, the cluster and time
variable do not have to be restated. The commandxtdescribecan be typed to see
descriptive measures of the cluster variable.
Next, a GEE model is demonstrated with the infant care dataset. GEE models can be
executed with thextgeecommand in Stata.
The model is stated as follows:
logit PðOUTCOME¼ 1 jXÞ¼b 0 þb 1 BIRTHWGTþb 2 GENDERþb 3 DIARRHEA
The code to run this model with an AR1 correlation structure is:
xtgee outcome birthwgt gender diarrhea, family (binomial)
link(logit) corr(ar1) vce(robust)
Following the commandxtgeeis the dependent variable followed by a list of the
independent variables. Thelink()andfamily()options define the link function and
the distribution of the response. Thecorr()option allows the correlation structure to
be specified. Thevce(robust)option requests empirically based standard errors. The
optionscorr(ind), corr(exc), corr(sta4), andcorr(uns), can be used to request an
independent, exchangeable, stationary 4-dependent, and an unstructured working
correlation structure respectively.
The output using the AR1 correlation structure follows:
GEE population-averaged model Number of obs ¼ 1203
Group and time vars: idno month Number of groups ¼ 136
Link: logit Obs per group: min ¼ 5
Family: binomial avg ¼ 8.8
Correlation: AR(1) max ¼ 9
Wald chi2(3) ¼ 2.73
Scale parameter: 1 Prob>chi2 ¼ 0.4353
(standard errors adjusted for clustering on idno)
outcome Coef.