Pattern Recognition and Machine Learning

(Jeff_L) #1
36 1. INTRODUCTION

extend this approach to deal with input spaces having several variables. If we have
Dinput variables, then a general polynomial with coefficients up to order 3 would
take the form

y(x,w)=w 0 +

∑D

i=1

wixi+

∑D

i=1

∑D

j=1

wijxixj+

∑D

i=1

∑D

j=1

∑D

k=1

wijkxixjxk. (1.74)

AsDincreases, so the number of independent coefficients (not all of the coefficients
are independent due to interchange symmetries amongst thexvariables) grows pro-
portionally toD^3. In practice, to capture complex dependencies in the data, we may
need to use a higher-order polynomial. For a polynomial of orderM, the growth in
Exercise 1.16 the number of coefficients is likeDM. Although this is now a power law growth,
rather than an exponential growth, it still points to the method becoming rapidly
unwieldy and of limited practical utility.
Our geometrical intuitions, formed through a life spent in a space of three di-
mensions, can fail badly when we consider spaces of higher dimensionality. As a
simple example, consider a sphere of radiusr=1in a space ofDdimensions, and
ask what is the fraction of the volume of the sphere that lies between radiusr=1−
andr=1. We can evaluate this fraction by noting that the volume of a sphere of
radiusrinDdimensions must scale asrD, and so we write


VD(r)=KDrD (1.75)

Exercise 1.18 where the constantKDdepends only onD. Thus the required fraction is given by


VD(1)−VD(1− )
VD(1)

=1−(1− )D (1.76)

which is plotted as a function of for various values ofDin Figure 1.22. We see
that, for largeD, this fraction tends to 1 even for small values of. Thus, in spaces
of high dimensionality, most of the volume of a sphere is concentrated in a thin shell
near the surface!
As a further example, of direct relevance to pattern recognition, consider the
behaviour of a Gaussian distribution in a high-dimensional space. If we transform
from Cartesian to polar coordinates, and then integrate out the directional variables,
Exercise 1.20 we obtain an expression for the densityp(r)as a function of radiusrfrom the origin.
Thusp(r)δris the probability mass inside a thin shell of thicknessδrlocated at
radiusr. This distribution is plotted, for various values ofD, in Figure 1.23, and we
see that for largeDthe probability mass of the Gaussian is concentrated in a thin
shell.
The severe difficulty that can arise in spaces of many dimensions is sometimes
called thecurse of dimensionality(Bellman, 1961). In this book, we shall make ex-
tensive use of illustrative examples involving input spaces of one or two dimensions,
because this makes it particularly easy to illustrate the techniques graphically. The
reader should be warned, however, that not all intuitions developed in spaces of low
dimensionality will generalize to spaces of many dimensions.

Free download pdf