Pattern Recognition and Machine Learning

(Jeff_L) #1
654 14. COMBINING MODELS

model combination is to select one of the models to make the prediction, in which
the choice of model is a function of the input variables. Thus different models be-
come responsible for making predictions in different regions of input space. One
widely used framework of this kind is known as adecision treein which the selec-
tion process can be described as a sequence of binary selections corresponding to
the traversal of a tree structure and is discussed in Section 14.4. In this case, the
individual models are generally chosen to be very simple, and the overall flexibility
of the model arises from the input-dependent selection process. Decision trees can
be applied to both classification and regression problems.
One limitation of decision trees is that the division of input space is based on
hard splits in which only one model is responsible for making predictions for any
given value of the input variables. The decision process can be softened by moving
to a probabilistic framework for combining models, as discussed in Section 14.5. For
example, if we have a set ofKmodels for a conditional distributionp(t|x,k)where
xis the input variable,tis the target variable, andk=1,...,Kindexes the model,
then we can form a probabilistic mixture of the form

p(t|x)=

∑K

k=1

πk(x)p(t|x,k) (14.1)

in whichπk(x)=p(k|x)represent the input-dependent mixing coefficients. Such
models can be viewed as mixture distributions in which the component densities, as
well as the mixing coefficients, are conditioned on the input variables and are known
asmixtures of experts. They are closely related to the mixture density network model
discussed in Section 5.6.

14.1 Bayesian Model Averaging


It is important to distinguish between model combination methods and Bayesian
model averaging, as the two are often confused. To understand the difference, con-
Section 9.2 sider the example of density estimation using a mixture of Gaussians in which several
Gaussian components are combined probabilistically. The model contains a binary
latent variablezthat indicates which component of the mixture is responsible for
generating the corresponding data point. Thus the model is specified in terms of a
joint distribution
p(x,z) (14.2)
and the corresponding density over the observed variablexis obtained by marginal-
izing over the latent variable


p(x)=


z

p(x,z). (14.3)
Free download pdf