Pattern Recognition and Machine Learning

(Jeff_L) #1
2.3. The Gaussian Distribution 95

Figure 2.10 A schematic illustration of two correlated ran-
dom variableszandθ, together with the
regression functionf(θ)given by the con-
ditional expectationE[z|θ]. The Robbins-
Monro algorithm provides a general sequen-
tial procedure for finding the rootθof such
functions. θ


z

θ

f(θ)

dissect out the contribution from the final data pointxN, we obtain

μ(MLN) =

1

N

∑N

n=1

xn

=

1

N

xN+

1

N

N∑− 1

n=1

xn

=

1

N

xN+

N− 1

N

μ
(N−1)
ML

= μ
(N−1)
ML +

1

N

(xN−μ
(N−1)
ML ). (2.126)

This result has a nice interpretation, as follows. After observingN− 1 data points
we have estimatedμbyμ(MLN−1). We now observe data pointxN, and we obtain our
revised estimateμ(MLN)by moving the old estimate a small amount, proportional to
1 /N, in the direction of the ‘error signal’(xN−μ(MLN−1)). Note that, asNincreases,
so the contribution from successive data points gets smaller.
The result (2.126) will clearly give the same answer as the batch result (2.121)
because the two formulae are equivalent. However, we will not always be able to de-
rive a sequential algorithm by this route, and so we seek a more general formulation
of sequential learning, which leads us to theRobbins-Monroalgorithm. Consider a
pair of random variablesθandzgoverned by a joint distributionp(z, θ). The con-
ditional expectation ofzgivenθdefines a deterministic functionf(θ)that is given
by
f(θ)≡E[z|θ]=


zp(z|θ)dz (2.127)

and is illustrated schematically in Figure 2.10. Functions defined in this way are
calledregression functions.
Our goal is to find the rootθat whichf(θ)=0. If we had a large data set
of observations ofzandθ, then we could model the regression function directly and
then obtain an estimate of its root. Suppose, however, that we observe values of
zone at a time and we wish to find a corresponding sequential estimation scheme
forθ. The following general procedure for solving such problems was given by
Free download pdf