31.5 MAXIMUM-LIKELIHOOD METHOD
whereHdenotes our hypothesis of an assumed functional form. Now, using
Bayes’ theorem(see subsection 30.2.3), we may write
P(a|x,H)=
P(x|a,H)P(a|H)
P(x|H)
, (31.81)
which provides us with an expression for the probability distributionP(a|x,H)
of the parametersa, given the (fixed) dataxand our hypothesisH, in terms of
other quantities that we may assign. The various terms in (31.81) have special
formal names, as follows.
- The quantityP(a|H)ontheRHSisthepriorprobability, which represents our
state of knowledge of the parameter values (given the hypothesisH)beforewe
have analysed the data.
- This probability is modified by the experimental dataxthrough thelikelihood
P(x|a,H).
- When appropriately normalised by theevidenceP(x|H), this yields theposterior
probabilityP(a|x,H), which is the quantity of interest.
- The posterior encodesallour inferences about the values of the parametersa.
Strictly speaking, from a Bayesian viewpoint, thisentire function,P(a|x,H), is
the ‘answer’ to a parameter estimation problem.
Given a particular hypothesis, the (normalising) evidence factorP(x|H)is
unimportant, since it does not depend explicitly upon the parameter valuesa.
Thus, it is often omitted and one considers only the proportionality relation
P(a|x,H)∝P(x|a,H)P(a|H). (31.82)
If necessary, the posterior distribution can be normalised empirically, by requiring
that it integrates to unity, i.e.
∫
P(a|x,H)dma= 1, where the integral extends over
all values of the parametersa 1 ,a 2 ,...,am.
The priorP(a|H) in (31.82) should reflect our entire knowledge concerning the
values of the parametersa,beforethe analysis of the current datax. For example,
there may be some physical reason to require some or all of the parameters to
lie in a given range. If we are largely ignorant of the values of the parameters,
we often indicate this by choosing auniform(or very broad) prior,
P(a|H) = constant,
in which case the posterior distribution is simply proportional to the likelihood.
In this case, we thus have
P(a|x,H)∝L(x;a). (31.83)
In other words, if we assume a uniform prior then we can identify the posterior
distribution (up to a normalising factor) withL(x;a), considered as a function of
the parametersa.