Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

354 Generative Models


As before, given a specific value ofθ, it is assumed that the conditional proba-
bility,P[X=x|θ], is known. In the drug company example,Xtakes values in
{ 0 , 1 }andP[X=x|θ] =θx(1−θ)^1 −x.
Once the prior distribution overθand the conditional distribution overX
givenθare defined, we again have complete knowledge of the distribution over
X. This is because we can write the probability overXas a marginal probability
P[X=x] =


θ

P[X=x,θ] =


θ

P[θ]P[X=x|θ],

where the last equality follows from the definition of conditional probability. If
θis continuous we replaceP[θ] with the density function and the sum becomes
an integral:

P[X=x] =


θ

P[θ]P[X=x|θ]dθ.

Seemingly, once we knowP[X=x], a training setS= (x 1 ,...,xm) tells us
nothing as we are already experts who know the distribution over a new point
X. However, the Bayesian view introduces dependency betweenSandX. This is
because we now refer toθas a random variable. A new pointXand the previous
points inSare independentonlyconditioned onθ. This is different from the
frequentist philosophy in whichθis a parameter that we might not know, but
since it is just a parameter of the distribution, a new pointXand previous points
Sare always independent.
In the Bayesian framework, sinceXandSare not independent anymore, what
we would like to calculate is the probability ofXgivenS, which by the chain
rule can be written as follows:
P[X=x|S] =


θ

P[X=x|θ,S]P[θ|S] =


θ

P[X=x|θ]P[θ|S].

The second inequality follows from the assumption thatXandSare independent
when we condition onθ. UsingBayes’ rulewe have

P[θ|S] =

P[S|θ]P[θ]
P[S]

,

and together with the assumption that points are independent conditioned onθ,
we can write

P[θ|S] =P[S|θ]P[θ]
P[S]

=^1

P[S]

∏m

i=1

P[X=xi|θ]P[θ].

We therefore obtain the following expression for Bayesian prediction:

P[X=x|S] =

1

P[S]


θ

P[X=x|θ]

∏m

i=1

P[X=xi|θ]P[θ]. (24.16)

Getting back to our drug company example, we can rewriteP[X=x|S] as

P[X=x|S] =^1
P[S]


θx+


ixi(1−θ)^1 −x+


i(1−xi)P[θ]dθ.
Free download pdf