Mathematical Methods for Physics and Engineering : A Comprehensive Guide

31.6 THE METHOD OF LEAST SQUARES

The other possibility is thatλis an independent parameter and not a function

of the parametersa. In this case, the extended log-likelihood function is

lnL=Nlnλ−λ+

∑N

i=1

lnP(xi|a), (31.89)

where we have omitted terms not depending onλora. Differentiating with

respect toλand setting the result equal to zero, we find that the ML estimate of

λis simply

λˆ=N.

By differentiating (31.89) with respect to the parametersaiand setting the results

equal to zero, we obtain the usual ML estimatesaˆiof their values. In this case,

however, the errors in our estimates will be larger, in general, than those in the

standard likelihood approach, since they must include the effect of statistical

uncertainty in the parameterλ.

31.6 The method of least squares

The method of least squares is, in fact, just a special case of the method of

maximum likelihood. Nevertheless, it is so widely used as a method of parameter

estimation that it has acquired a special name of its own. At the outset, let us

suppose that a data sample consists of a set of pairs (xi,yi),i=1, 2 ,...,N.For

example, these data might correspond to the temperatureyimeasured at various

pointsxialong some metal rod.

For the moment, we will suppose that thexiare known exactly, whereas there

exists a measurement error (ornoise)nion each of the valuesyi. Moreover, let

us assume that the true value ofyat any positionxis given by some function

y=f(x;a) that depends on theMunknown parametersa.Then

yi=f(xi;a)+ni.

Our aim is to estimate the values of the parametersafrom the data sample.

Bearing in mind the central limit theorem, let us suppose that theniare drawn

from aGaussiandistribution with no systematic bias and hence zero mean. In the

most general case the measurement errorsnimightnotbe independent but be

described by anN-dimensional multivariate Gaussian with non-trivial covariance

matrixN, whose elementsNij=Cov[ni,nj] we assume to be known. Under these

assumptions it follows from (30.148), that the likelihood function is

L(x,y;a)=

1 (2π)N/^2 |N|^1 /^2

exp

[ −^12 χ^2 (a)

] ,