Mathematical Methods for Physics and Engineering : A Comprehensive Guide

(lu) #1

31.5 MAXIMUM-LIKELIHOOD METHOD


whereHdenotes our hypothesis of an assumed functional form. Now, using


Bayes’ theorem(see subsection 30.2.3), we may write


P(a|x,H)=

P(x|a,H)P(a|H)
P(x|H)

, (31.81)

which provides us with an expression for the probability distributionP(a|x,H)


of the parametersa, given the (fixed) dataxand our hypothesisH, in terms of


other quantities that we may assign. The various terms in (31.81) have special


formal names, as follows.



  • The quantityP(a|H)ontheRHSisthepriorprobability, which represents our


state of knowledge of the parameter values (given the hypothesisH)beforewe
have analysed the data.


  • This probability is modified by the experimental dataxthrough thelikelihood


P(x|a,H).


  • When appropriately normalised by theevidenceP(x|H), this yields theposterior


probabilityP(a|x,H), which is the quantity of interest.


  • The posterior encodesallour inferences about the values of the parametersa.


Strictly speaking, from a Bayesian viewpoint, thisentire function,P(a|x,H), is
the ‘answer’ to a parameter estimation problem.

Given a particular hypothesis, the (normalising) evidence factorP(x|H)is

unimportant, since it does not depend explicitly upon the parameter valuesa.


Thus, it is often omitted and one considers only the proportionality relation


P(a|x,H)∝P(x|a,H)P(a|H). (31.82)

If necessary, the posterior distribution can be normalised empirically, by requiring


that it integrates to unity, i.e.



P(a|x,H)dma= 1, where the integral extends over

all values of the parametersa 1 ,a 2 ,...,am.


The priorP(a|H) in (31.82) should reflect our entire knowledge concerning the

values of the parametersa,beforethe analysis of the current datax. For example,


there may be some physical reason to require some or all of the parameters to


lie in a given range. If we are largely ignorant of the values of the parameters,


we often indicate this by choosing auniform(or very broad) prior,


P(a|H) = constant,

in which case the posterior distribution is simply proportional to the likelihood.


In this case, we thus have


P(a|x,H)∝L(x;a). (31.83)

In other words, if we assume a uniform prior then we can identify the posterior


distribution (up to a normalising factor) withL(x;a), considered as a function of


the parametersa.

Free download pdf