12.2.ProbabilisticpeA 577
dimensionality.Ifwerestrictthecovariancematrixtobediagonal,thenit hasonlyD
independentparameters,andsothenumberofparametersnowgrowslinearlywith
dimensionality.However,it nowtreatsthevariablesasiftheywereindependentand
hencecannolongerexpressanycorrelationsbetweenthem.ProbabilisticPeApro-
videsanelegantcompromiseinwhichtheM mostsignificantcorrelationscanbe
capturedwhilestillensuringthatthetotalnumberofparametersgrowsonlylinearly
with D. Wecanseethisbyevaluatingthenumberofdegreesoffreedominthe
PPCAmodelasfollows. ThecovariancematrixCdependsontheparametersW,
whichhassizeDxM,anda^2 ,givinga totalparametercountofDM+1.However,
wehaveseenthatthereis someredundancyinthisparameterizationassociatedwith
rotationsofthecoordinatesysteminthelatentspace.TheorthogonalmatrixRthat
expressestheserotationshassizeMxM.Inthefirstcolumnofthismatrixthereare
M - 1 independentparameters,becausethecolumnvectormustbenormalizedto
unitlength.InthesecondcolumnthereareM - 2 independentparameters,because
thecolumnmustbenormalizedandalsomustbeorthogonaltothepreviouscolumn,
andsoon.Summingthisarithmeticseries,weseethatRhasa totalofM(M-1)/2
independentparameters.Thusthenumberofdegreesoffreedominthecovariance
matrixCis givenby
DM+1 - M(M- 1)/2. (12.51)
Exercise 12.14
Section12.2.4
Section9.4
Thenumberofindependentparametersinthismodelthereforeonlygrowslinearly
withD,forfixedM.IfwetakeM = D- 1,thenwerecoverthestandardresult
fora fullcovarianceGaussian. Inthiscase,thevariancealongD- 1 linearlyin-
dependentdirectionsis controlledbythecolumnsofW,andthevariancealongthe
remainingdirectionis givenbya^2 .IfM = 0,themodelis equivalenttotheisotropic
covariancecase.
12.2.2 EMalgorithmforpeA
Aswehaveseen,theprobabilisticPCAmodelcanbeexpressedintermsofa
marginalizationovera continuouslatentspacez inwhichforeachdatapointXn,
thereisa correspondinglatentvariableZn. WecanthereforemakeuseoftheEM
algorithmtofindmaximumlikelihoodestimatesofthemodelparameters.Thismay
seemratherpointlessbecausewehavealreadyobtainedanexactclosed-formso-
lutionforthemaximumlikelihoodparametervalues. However,inspacesofhigh
dimensionality,theremaybecomputationaladvantagesinusinganiterativeEM
procedureratherthanworkingdirectlywiththesamplecovariancematrix.ThisEM
procedurecanalsobeextendedtothefactoranalysismodel,forwhichthereisno
closed-formsolution. Finally,it allowsmissingdatatobehandledina principled
way.
WecanderivetheEMalgorithmforprobabilisticPCAbyfollowingthegeneral
frameworkforEM.Thuswewritedownthecomplete-dataloglikelihoodand take
itsexpectation withrespecttotheposteriordistributionofthelatentdistribution
evaluatedusing'old'parametervalues. Maximizationofthisexpectedcomplete-
dataloglikelihoodthenyieldsthe'new'parametervalues.Becausethedatapoints