360 15 Combining Information
nand variancesv,w, respectively, then the combined random variable has mean
wm+vn
v+w
=
m
v+
n
w
1
v+
1
w
and variance
vw
v+w
=^1
v
+^1
w
.
This result is easily extended to the combination of any number of in-
dependent normal distributions. The means are combined by means of a
weighted average, using weights that are proportional to the inverse vari-
ances.
We can now combine the two temperature measurements 30.5◦±0.4◦C
and 30.2◦±0.3◦C mentioned earlier. The variances are 0.16 and 0.09, so the
combined mean is 30.3◦±0.24◦C. The combined mean is closer to 30.2◦C
than to 30.5◦C because the former measurement is more accurate.
The formula for combining normal distributions applies equally well to
multivariate normal distributions. The only differences are that the mean is
a vector and the variance is a symmetric matrix (often called the covariance).
This formula is the basis for the Kalman filter (Maybeck 1979) in which a
sequence of estimates is successively updated by independent observations.
The Kalman filter update formula is usually derived by using an optimiza-
tion criterion such as least squares. However, nothing more than elementary
probability theory is necessary.
Information combination is commonly formulated in terms of a priori and
a posteriori distributions. The a priori or prior distribution is one of the
two distributions being combined, while the experiment or observation is
the other one. The a posteriori distribution is the combined distribution. Al-
though the formulation in terms of a priori and a posteriori distributions is
equivalent to information combination, it can be somewhat misleading, as
it suggests that the two distributions play different roles in the process. In
fact, information combination is symmetric: the two distributions being com-
bined play exactly the same role. One of the two distributions will generally
have more effect on the result, but this is due to it having more accuracy, not
because it is the prior distribution or the observation.
Another example of information combination is stochastic inference in a
BN, as presented in section 14.2. The evidence is combined with the BN, and
the distributions of the query nodes are obtained by computing the marginal
distributions of the combined JPD. Since the evidence usually specifies in-
formation about only some of the nodes, a full JPD is constructed by using