582 12.CONTINUOUSLATENTVARIABLES
Figure12.13 ProbabilisticgraphicalmodelforBayesianpeAin
whichthedistributionovertheparametermatrixW
isgovernedbya vectoraofhyperparameters.
w
N
proximation,whichis appropriatewhenthenumberofdatapointsis relativelylarge
andthecorrespondingposteriordistributionistightlypeaked(Bishop,1999a). It
involvesa specificchoiceofprioroverW thatallowssurplusdimensionsinthe
principalsubspacetobeprunedoutofthemodel.Thiscorrespondstoanexampleof
automaticrelevancedetermination,orARD,discussedinSection7.2.2.Specifically,
wedefineanindependentGaussianpriorovereachcolumnofW,whichrepresent
thevectorsdefiningtheprincipalsubspace.EachsuchGaussianhasanindependent
variancegovernedbya precisionhyperparameterO:isothat
(12.60)
Section7.2
whereWiis theithcolumnofW.Theresultingmodelcanberepresentedusingthe
directedgraphshowninFigure12.13.
ThevaluesforO:iwillbefounditerativelybymaximizingthemarginallikeli-
hoodfunctioninwhichW hasbeenintegratedout.Asa resultofthisoptimization,
someoftheO:imaybedriventoinfinity,withthecorrespondingparametersvec-
torWibeingdriventozero(theposteriordistributionbecomesa deltafunctionat
theorigin)givinga sparsesolution. Theeffectivedimensionalityoftheprincipal
subspaceis thendeterminedbythenumberoffiniteO:ivalues,andthecorrespond-
ingvectorsWicanbethoughtofas'relevant'formodellingthedatadistribution.
Inthisway,theBayesianapproachisautomaticallymakingthetrade-offbetween
improvingthefittothedata,byusinga largernumberofvectorsWiwiththeircor-
respondingeigenvaluesAieachtunedtothedata,andreducingthecomplexityof
themodelbysuppressingsomeoftheWivectors.Theoriginsofthissparsitywere
discussedearlierinthecontextofrelevancevectormachines.
ThevaluesofO:iarere-estimatedduringtrainingbymaximizingthelogmarginal
likelihoodgivenby
p(Xla,J-L,0'2)= Jp(XIW,J-L,O'2)p(Wla)dW (12.61)
wherethelogofp(XIW,J-L,0'2)is givenby(12.43).Notethatforsimplicitywealso
treatJ-Land0'2asparameterstobeestimated,ratherthandefiningpriorsoverthese
parameters.