Pattern Recognition and Machine Learning

(Jeff_L) #1

Appendix D. Calculus of Variations


We can think of a functiony(x)as being an operator that, for any input valuex,
returns an output valuey. In the same way, we can define afunctionalF[y]to be
an operator that takes a functiony(x)and returns an output valueF. An example of
a functional is the length of a curve drawn in a two-dimensional plane in which the
path of the curve is defined in terms of a function. In the context of machine learning,
a widely used functional is the entropyH[x]for a continuous variablexbecause, for
any choice of probability density functionp(x), it returns a scalar value representing
the entropy ofxunder that density. Thus the entropy ofp(x)could equally well have
been written asH[p].
A common problem in conventional calculus is to find a value ofxthat max-
imizes (or minimizes) a functiony(x). Similarly, in the calculus of variations we
seek a functiony(x)that maximizes (or minimizes) a functionalF[y]. That is, of all
possible functionsy(x), we wish to find the particular function for which the func-
tionalF[y]is a maximum (or minimum). The calculus of variations can be used, for
instance, to show that the shortest path between two points is a straight line or that
the maximum entropy distribution is a Gaussian.
If we weren’t familiar with the rules of ordinary calculus, we could evaluate a
conventional derivativedy/dxby making a small change to the variablexand
then expanding in powers of , so that


y(x+ )=y(x)+

dy
dx

+O( 2 ) (D.1)

and finally taking the limit → 0. Similarly, for a function of several variables
y(x 1 ,...,xD), the corresponding partial derivatives are defined by


y(x 1 + 1 ,...,xD+ (^) D)=y(x 1 ,...,xD)+
∑D
i=1
∂y
∂xi
(^) i+O( 2 ). (D.2)
The analogous definition of a functional derivative arises when we consider how
much a functionalF[y]changes when we make a small change η(x)to the function


703
Free download pdf