Mathematical Methods for Physics and Engineering : A Comprehensive Guide

31.5 MAXIMUM-LIKELIHOOD METHOD

whereHdenotes our hypothesis of an assumed functional form. Now, using

Bayes’ theorem(see subsection 30.2.3), we may write

P(a|x,H)=

P(x|a,H)P(a|H) P(x|H)

, (31.81)

which provides us with an expression for the probability distributionP(a|x,H)

of the parametersa, given the (fixed) dataxand our hypothesisH, in terms of

other quantities that we may assign. The various terms in (31.81) have special

formal names, as follows.

The quantityP(a|H)ontheRHSisthepriorprobability, which represents our

state of knowledge of the parameter values (given the hypothesisH)beforewe have analysed the data.

This probability is modified by the experimental dataxthrough thelikelihood

P(x|a,H).

When appropriately normalised by theevidenceP(x|H), this yields theposterior

probabilityP(a|x,H), which is the quantity of interest.

The posterior encodesallour inferences about the values of the parametersa.

Strictly speaking, from a Bayesian viewpoint, thisentire function,P(a|x,H), is the ‘answer’ to a parameter estimation problem.

Given a particular hypothesis, the (normalising) evidence factorP(x|H)is

unimportant, since it does not depend explicitly upon the parameter valuesa.

Thus, it is often omitted and one considers only the proportionality relation

P(a|x,H)∝P(x|a,H)P(a|H). (31.82)

If necessary, the posterior distribution can be normalised empirically, by requiring

that it integrates to unity, i.e.

∫ P(a|x,H)dma= 1, where the integral extends over

all values of the parametersa 1 ,a 2 ,...,am.

The priorP(a|H) in (31.82) should reflect our entire knowledge concerning the

values of the parametersa,beforethe analysis of the current datax. For example,

there may be some physical reason to require some or all of the parameters to

lie in a given range. If we are largely ignorant of the values of the parameters,

we often indicate this by choosing auniform(or very broad) prior,

P(a|H) = constant,

in which case the posterior distribution is simply proportional to the likelihood.

In this case, we thus have

P(a|x,H)∝L(x;a). (31.83)

In other words, if we assume a uniform prior then we can identify the posterior

distribution (up to a normalising factor) withL(x;a), considered as a function of

the parametersa.