(12.58)
12.2.ProbabilisticpeA 579
eigenvectordecompositionofthesamplecovariancematrix, theEMapproachis
iterativeandsomightappeartobelessattractive. However,eachcycleoftheEM
algorithmcanbecomputationallymuchmoreefficientthanconventionalPCAin
spacesofhighdimensionality.Toseethis,wenotethattheeigendecompositionof
thecovariancematrixrequiresO(D^3 ) computation. Oftenweareinterestedonly
inthefirstMeigenvectorsandtheircorrespondingeigenvalues,inwhichcasewe
canusealgorithmsthatare 0 (MD^2 ). However,theevaluationofthecovariance
matrixitselftakes 0 (ND^2 )computations,whereNisthenumberofdatapoints.
Algorithmssuchasthesnapshotmethod(Sirovich,1987),whichassumethatthe
eigenvectorsarelinearcombinationsofthedatavectors,avoiddirectevaluationof
thecovariancematrixbutareO(N^3 )andhenceunsuitedtolargedatasets.TheEM
algorithmdescribedherealsodoesnotconstructthecovariancematrixexplicitly.
Instead,themostcomputationallydemandingstepsarethoseinvolvingsumsover
thedatasetthatare 0 (NDM).ForlargeD,andM« D,thiscanbea significant
savingcomparedto 0 (ND^2 )andcanoffsettheiterativenatureoftheEMalgorithm.
NotethatthisEMalgorithmcanbeimplementedinanon-lineforminwhich
eachD-dimensionaldatapointisreadinandprocessedandthendiscardedbefore
thenextdatapointisconsidered. Toseethis,notethatthequantitiesevaluatedin
theE step(anM-dimensionalvectorandanM x Mmatrix)canbecomputedfor
eachdatapointseparately,andintheMstepweneedtoaccumulatesumsoverdata
points,whichwecandoincrementally.Thisapproachcanbeadvantageousifboth
NandDarelarge.
Becausewenowhavea fullyprobabilisticmodelforPCA,wecandealwith
missingdata,providedthatitismissingatrandom,bymarginalizingoverthedis-
tributionoftheunobservedvariables. Againthesemissingvaluescanbetreated
usingtheEMalgorithm. Wegiveanexampleoftheuseofthisapproachfordata
visualizationinFigure12.11.
AnotherelegantfeatureoftheEMapproachisthatwecantakethelimita^2 ----t0,
correspondingtostandardPCA,andstillobtaina validEM-likealgorithm(Roweis,
1998).From(12.55),weseethattheonlyquantityweneedtocomputeintheEstep
isJE[zn].Furthermore,theMstepissimplifie~becauseM =WTW.Toemphasize
thesimplicityofthealgorithm,letusdefineXtobea matrixofsizeN xDwhose
nthrowisgivenbythevectorXn- xandsimilarlydefine 0 tobea matrixofsize
Dx Mwhosenthrowis givenbythevectorJE[zn]. TheEstep(12.54)oftheEM
algorithmforPCAthenbecomes
o= (W~dWold)-lW~dX
andtheMstep(12.56)takestheform
Wnew= XTOT(OOT)-l. (12.59)
Againthesecanbeimplementedinanon-lineform. Theseequationshavea simple
interpretationasfollows.Fromourearlierdiscussion,weseethattheE stepinvolves
anorthogonalprojectionofthedatapointsontothecurrentestimatefortheprincipal
subspace. Correspondingly,theMsteprepresentsa re-estimationoftheprincipal