Pattern Recognition and Machine Learning

Section12.2.2

Section12.2.3

Section8.1.4

12.2.ProbabilisticpeA 571

WecanderiveanEMalgorithmforPCAthatiscomputationallyefficientin
situationswhereonlya fewleadingeigenvectorsarerequiredandthatavoids
havingtoevaluatethedatacovariancematrixasanintermediatestep.

Thecombinationofa probabilisticmodelandEMallowsustodealwithmiss-
ingvaluesinthedataset.

MixturesofprobabilisticPCAmodelscanbeformulatedina principledway
andtrainedusingtheEMalgorithm.

ProbabilisticPCAformsthebasisfora BayesiantreatmentofPCAinwhich
thedimensionalityoftheprincipalsubspacecanbefoundautomaticallyfrom
thedata.

Theexistenceofa likelihoodfunctionallowsdirectcomparisonwithother
probabilisticdensitymodels.Bycontrast,conventionalPCAwillassigna low
reconstructioncosttodatapointsthatareclosetotheprincipalsubspaceeven
iftheyliearbitrarilyfarfromthetrainingdata.

ProbabilisticPCAcanbeusedtomodelclass-conditionaldensitiesandhence
beappliedtoclassificationproblems.

TheprobabilisticPCAmodelcanberungenerativelytoprovidesamplesfrom
thedistribution.

ThisformulationofPCAasa probabilisticmodelwasproposedindependentlyby TippingandBishop(1997,1999b)andbyRoweis(1998).Asweshallsee later,it is closelyrelatedtofactoranalysis(Basilevsky,1994). ProbabilisticPCAisa simpleexampleofthelinear-Gaussianframework, in whichallofthemarginalandconditionaldistributionsareGaussian.Wecanformu- lateprobabilisticPCAbyfirstintroducinganexplicitlatentvariablez corresponding totheprincipal-componentsubspace. Nextwedefinea Gaussianpriordistribution p(z)overthelatentvariable,togetherwitha Gaussianconditionaldistributionp(xl z) fortheobservedvariablexconditionedonthevalueofthelatentvariable.Specifi- cally,thepriordistributionoverz is givenbya zero-meanunit-covarianceGaussian

p(z)=N(zIO,I). (12.31)

Similarly,theconditionaldistributionoftheobservedvariablex,conditionedonthe valueofthelatentvariablez,is againGaussian,oftheform

p(xlz)=N(xlWz+J-L,a^2 I) (12.32)

Section8.2.2

inwhichthemeanofxisa generallinearfunctionofz governedbytheD xM

matrixWandtheD-dimensionalvectorJ-L.Notethatthisfactorizeswithrespectto

theelementsofx,inotherwordsthisisanexampleofthenaiveBayesmodel.As weshallseeshortly,thecolumnsofW spana linearsubspacewithinthedataspace thatcorrespondstotheprincipalsubspace.Theotherparameterinthismodelis the scalara^2 governingthevarianceoftheconditionaldistribution.Notethatthereis no

Pattern Recognition and Machine Learning

matrixWandtheD-dimensionalvectorJ-L.Notethatthisfactorizeswithrespectto

Get our desktop app

Company

Features

Documentation

Resources