David T. Jacho-Chávez and Pravin K. Trivedi 807
function is:
L(θ,π|y)=
∏n
i= 1
∑C
j= 1
π
dij
j [fj(y;θj)]
dij,0<π
j<1,
∑C
j= 1
πj=1. (15.34)
Ifπj,j=1,...,C, is given, the posterior probability that observationyibelongs
to the populationj,j=1, 2,...,C, is denotedzij, andE
[
zij
]
=πj.
Estimation is implemented using the EM algorithm explained below. This may
be slow to converge, especially if the starting values are not good. Gradient-based
methods, such as Newton–Raphson or BFGS, are also used (see Böhning, 1995).
The application reported below uses Newton–Raphson with gradients estimated
using analytical formulae.
The EM algorithm is structured as in algorithm 15.5.2.1.1.
Algorithm 15.5.2.1.1EM – implementation
- Given an initial estimate[π(^0 ),θ(^0 )], the likelihood function (15.34) may be
maximized using the EM algorithm in which the variabledijis replaced by its
expected value,E
[
dij
]
=̂zij, yielding the expected log-likelihood:
E[L(|y,π)]=
∑n
i= 1
∑C
j= 1
̂zij
[
lnfj(yi;θj)+lnπj
]
. (15.35)
2. The M-step of the EM algorithm maximizes (15.35) by solving the first-order
conditions:
̂πj−n−^1
∑n
i= 1
̂zij=0, j=1,...,C
∑n
i= 1
∑C
j= 1
̂zij
∂lnfj(yi;θj)
∂θj
=0.
- Evaluate the marginal posterior probabilityzij|̂πj,̂θj,j=1,...,C,
zij≡Pr[yi∈populationj]=
πjfj(yi|xi,θj)
∑C
j= 1 πjfj(yi|xi,θj)
.
The E-step of the EM procedure obtains new values ofE[dij]usingE
[
zij
]
=πj.
- Repeat steps 1–3 until|L(̂(k+ 1 ))−L((̂k))|<tol, where tol denotes the
selected tolerance level.
For estimating var()̂, one can use either the observed information matrix or
the robust Eicker–White sandwich formula. Though asymptotically valid given