Pattern Recognition and Machine Learning

Section4.4

Section3.5.3

12.2.ProbabilisticpeA 583

Becausethisintegrationisintractable,wemakeuseoftheLaplaceapproxima-

tion.Ifweassumethattheposteriordistributionissharplypeaked,aswilloccurfor

sufficientlylargedatasets,thenthere-estimationequationsobtainedbymaximizing themarginallikelihoodwithrespecttoaitakethesimpleform

(12.62)

whichfollowsfrom(3.98),notingthatthedimensionalityofWiisD. Thesere-

estimationsareinterleavedwiththeEMalgorithmupdatesfordeterminingWand

a^2 • TheE-stepequationsareagaingivenby(12.54)and(12.55). Similarly,theM- stepequationfora^2 isagaingivenby(12.57). TheonlychangeistotheM-step equationforW,whichismodifiedtogive

(12.63)

whereA= diag(ai)'ThevalueofI-"isgivenbythesamplemean,asbefore.

IfwechooseM = D- 1 then,ifallaivaluesarefinite,themodelrepresents

a full-covarianceGaussian,whileifalltheaigotoinfinitythemodelisequivalent toanisotropicGaussian,andsothemodelcanencompassallpennissiblevaluesfor theeffectivedimensionalityoftheprincipalsubspace.Itis alsopossibletoconsider

smallervaluesofM,whichwillsaveoncomputationalcostbutwhichwilllimit

themaximumdimensionalityofthesubspace. Acomparisonoftheresultsofthis algorithmwithstandardprobabilisticPCAis showninFigure12.14. BayesianPCAprovidesanopportunitytoillustratetheGibbssamplingalgo- rithmdiscussedinSection11.3. Figure12.15showsanexampleofthesamples fromthehyperparametersInaifora datasetinD= 4 dimensionsinwhichthedi-

mensionalityofthelatentspaceisM =3 butinwhichthedatasetis generatedfrom

a probabilisticPCAmodelhavingonedirectionofhighvariance,withtheremaining directionscomprisinglowvariancenoise.Thisresultshowsclearlythepresenceof threedistinctmodesintheposteriordistribution.Ateachstepoftheiteration,oneof thehyperparametershasa smallvalueandtheremainingtwohavelargevalues,so thattwoofthethreelatentvariablesaresuppressed.DuringthecourseoftheGibbs sampling,thesolutionmakessharptransitionsbetweenthethreemodes. Themodeldescribedhereinvolvesa prioronlyoverthematrixW. Afully BayesiantreatmentofPCA,includingpriorsover1-", a^2 ,andn,andsolvedus- ingvariationalmethods,isdescribedinBishop(1999b). Fora discussionofvari- ousBayesianapproachestodetenniningtheappropriatedimensionalityfora PCA model,seeMinka(2001c).

12.2.4 Factor analysis

Factoranalysisisa linear-Gaussianlatentvariablemodelthatis closelyrelated toprobabilisticPCA.ItsdefinitiondiffersfromthatofprobabilisticPCAonlyinthat theconditionaldistributionoftheobservedvariablexgiventhelatentvariablez is

Pattern Recognition and Machine Learning

tion.Ifweassumethattheposteriordistributionissharplypeaked,aswilloccurfor

(12.62)

estimationsareinterleavedwiththeEMalgorithmupdatesfordeterminingWand

(12.63)

IfwechooseM = D- 1 then,ifallaivaluesarefinite,themodelrepresents

smallervaluesofM,whichwillsaveoncomputationalcostbutwhichwilllimit

mensionalityofthelatentspaceisM =3 butinwhichthedatasetis generatedfrom

12.2.4 Factor analysis

Get our desktop app

Company

Features

Documentation

Resources