William Greene 525
11.6.1 Heterogeneity and the negative binomial model
The Poisson model is typically only the departure point for the analysis of
count data. The simple model has (at least) two shortcomings that arise from
heterogeneity that is not explicitly incorporated in the model.
One easily remedied minor issue concerns the units of measurement of the data.
In the Poisson model (and negative binomial model below), the parameterλiis
the expected number of eventsper unit of time. Thus there is a presumption in the
model formulation, e.g., the Poisson, that the same amount of time is observed
for eachi. In a spatial context, such as measurements of the incidence of a disease
per group ofNipersons, or the number of bomb craters per square mile in London
in 1940, the assumption would be that the same physical area or the same size
of population applies to each observation. Where this differs by individual, it will
introduce a type of heteroskedasticity in the model. The simple remedy is to modify
the model to account for theexposure, Ti, of the observation as follows:
Prob(yi=j|xi,Ti)=
exp(−Tiφi)(Tiφi)j
j!
,φi=exp(x′iβ),j=0, 1,....
The original model is returned if we writeλi=exp(x′iβ+lnTi). Thus, when the
exposure differs by observation, the appropriate accommodation is to include the
log of exposure in the regression part of the model with a coefficient of 1.0. (For
less than obvious reasons, the term “offset variable” is commonly associated with
the exposure variableTi.) Note that ifTiis the same for alli,lnTwill simply vanish
into the constant term of the model (assuming one is included inxi).
The less straightforward restriction of the Poisson model is thatE[yi|xi]=
Var[yi|xi]. This equidispersion assumption is a major shortcoming. Observed data
rarely, if ever, display this feature. The very large amount of research activity on
functional forms for count models is often focused on testing for equidispersion
and building functional forms that relax this assumption.
The overdispersion found in observed data can be attributed to omitted
heterogeneity in the Poisson model. A more complete regression specification
would be:
E[yi|xi]=λi=hiexp(x′iβi)=exp(x′iβ+εi),
where the heterogeneity,hi, has mean one and non-zero variance. Two candidates
for the distribution ofεihave dominated the literature, the log-normal model
discussed later and the log-gamma model. The most common specification is the
log-gamma model, which derives from the gamma variable:
f[hi]=[θθ/ (θ)]exp(−θhi)hθi−^1 , hi≥0.^25
This gamma distributed random variable has mean 1.0 and variance 1/θ. (A separate
variance parameter is not identified – the scaling in the model is, once again,
absorbed by the coefficient vector.) If we write the Poisson–log-gamma model as:
f(yi|xi,hi)=exp(−hiλi)(hiλi)yi/(yi+ 1 ),