Pattern Recognition and Machine Learning

(Jeff_L) #1
12.2.ProbabilisticpeA 577

dimensionality.Ifwerestrictthecovariancematrixtobediagonal,thenit hasonlyD


independentparameters,andsothenumberofparametersnowgrowslinearlywith
dimensionality.However,it nowtreatsthevariablesasiftheywereindependentand
hencecannolongerexpressanycorrelationsbetweenthem.ProbabilisticPeApro-
videsanelegantcompromiseinwhichtheM mostsignificantcorrelationscanbe
capturedwhilestillensuringthatthetotalnumberofparametersgrowsonlylinearly
with D. Wecanseethisbyevaluatingthenumberofdegreesoffreedominthe

PPCAmodelasfollows. ThecovariancematrixCdependsontheparametersW,


whichhassizeDxM,anda^2 ,givinga totalparametercountofDM+1.However,
wehaveseenthatthereis someredundancyinthisparameterizationassociatedwith
rotationsofthecoordinatesysteminthelatentspace.TheorthogonalmatrixRthat
expressestheserotationshassizeMxM.Inthefirstcolumnofthismatrixthereare
M - 1 independentparameters,becausethecolumnvectormustbenormalizedto
unitlength.InthesecondcolumnthereareM - 2 independentparameters,because
thecolumnmustbenormalizedandalsomustbeorthogonaltothepreviouscolumn,

andsoon.Summingthisarithmeticseries,weseethatRhasa totalofM(M-1)/2


independentparameters.Thusthenumberofdegreesoffreedominthecovariance
matrixCis givenby
DM+1 - M(M- 1)/2. (12.51)

Exercise 12.14


Section12.2.4


Section9.4


Thenumberofindependentparametersinthismodelthereforeonlygrowslinearly
withD,forfixedM.IfwetakeM = D- 1,thenwerecoverthestandardresult
fora fullcovarianceGaussian. Inthiscase,thevariancealongD- 1 linearlyin-

dependentdirectionsis controlledbythecolumnsofW,andthevariancealongthe


remainingdirectionis givenbya^2 .IfM = 0,themodelis equivalenttotheisotropic
covariancecase.

12.2.2 EMalgorithmforpeA


Aswehaveseen,theprobabilisticPCAmodelcanbeexpressedintermsofa
marginalizationovera continuouslatentspacez inwhichforeachdatapointXn,
thereisa correspondinglatentvariableZn. WecanthereforemakeuseoftheEM
algorithmtofindmaximumlikelihoodestimatesofthemodelparameters.Thismay
seemratherpointlessbecausewehavealreadyobtainedanexactclosed-formso-
lutionforthemaximumlikelihoodparametervalues. However,inspacesofhigh
dimensionality,theremaybecomputationaladvantagesinusinganiterativeEM
procedureratherthanworkingdirectlywiththesamplecovariancematrix.ThisEM
procedurecanalsobeextendedtothefactoranalysismodel,forwhichthereisno
closed-formsolution. Finally,it allowsmissingdatatobehandledina principled
way.
WecanderivetheEMalgorithmforprobabilisticPCAbyfollowingthegeneral
frameworkforEM.Thuswewritedownthecomplete-dataloglikelihoodand take
itsexpectation withrespecttotheposteriordistributionofthelatentdistribution
evaluatedusing'old'parametervalues. Maximizationofthisexpectedcomplete-
dataloglikelihoodthenyieldsthe'new'parametervalues.Becausethedatapoints
Free download pdf