Pattern Recognition and Machine Learning

58 1. INTRODUCTION

Thus we can view the mutual information as the reduction in the uncertainty aboutx by virtue of being told the value ofy(or vice versa). From a Bayesian perspective, we can viewp(x)as the prior distribution forxandp(x|y)as the posterior distribution after we have observed new datay. The mutual information therefore represents the reduction in uncertainty aboutxas a consequence of the new observationy.

Exercises

1.1 () www Consider the sum-of-squares error function given by (1.2) in which the functiony(x,w)is given by the polynomial (1.1). Show that the coefficients w={wi}that minimize this error function are given by the solution to the following set of linear equations ∑M

j=0

Aijwj=Ti (1.122)

where

Aij=

∑N

n=1

(xn)i+j,Ti=

∑N

n=1

(xn)itn. (1.123)

Here a suffixiorjdenotes the index of a component, whereas(x)idenotesxraised to the power ofi.

1.2 () Write down the set of coupled linear equations, analogous to (1.122), satisfied by the coefficientswiwhich minimize the regularized sum-of-squares error function given by (1.4).

1.3 () Suppose that we have three coloured boxesr(red),b(blue), andg(green). Boxrcontains 3 apples, 4 oranges, and 3 limes, boxbcontains 1 apple, 1 orange, and 0 limes, and boxgcontains 3 apples, 3 oranges, and 4 limes. If a box is chosen at random with probabilitiesp(r)=0. 2 ,p(b)=0. 2 ,p(g)=0. 6 , and a piece of fruit is removed from the box (with equal probability of selecting any of the items in the box), then what is the probability of selecting an apple? If we observe that the selected fruit is in fact an orange, what is the probability that it came from the green box?

1.4 () www Consider a probability densitypx(x)defined over a continuous vari- ablex, and suppose that we make a nonlinear change of variable usingx=g(y), so that the density transforms according to (1.27). By differentiating (1.27), show that the location̂yof the maximum of the density inyis not in general related to the location̂xof the maximum of the density overxby the simple functional relation ̂x=g(̂y)as a consequence of the Jacobian factor. This shows that the maximum of a probability density (in contrast to a simple function) is dependent on the choice of variable. Verify that, in the case of a linear transformation, the location of the maximum transforms in the same way as the variable itself.

1.5 () Using the definition (1.38) show thatvar[f(x)]satisfies (1.39).

Pattern Recognition and Machine Learning

58 1. INTRODUCTION

Exercises

Get our desktop app

Company

Features

Documentation

Resources