Pattern Recognition and Machine Learning

(Jeff_L) #1
(12.58)

12.2.ProbabilisticpeA 579

eigenvectordecompositionofthesamplecovariancematrix, theEMapproachis
iterativeandsomightappeartobelessattractive. However,eachcycleoftheEM
algorithmcanbecomputationallymuchmoreefficientthanconventionalPCAin
spacesofhighdimensionality.Toseethis,wenotethattheeigendecompositionof
thecovariancematrixrequiresO(D^3 ) computation. Oftenweareinterestedonly
inthefirstMeigenvectorsandtheircorrespondingeigenvalues,inwhichcasewe
canusealgorithmsthatare 0 (MD^2 ). However,theevaluationofthecovariance
matrixitselftakes 0 (ND^2 )computations,whereNisthenumberofdatapoints.
Algorithmssuchasthesnapshotmethod(Sirovich,1987),whichassumethatthe
eigenvectorsarelinearcombinationsofthedatavectors,avoiddirectevaluationof


thecovariancematrixbutareO(N^3 )andhenceunsuitedtolargedatasets.TheEM


algorithmdescribedherealsodoesnotconstructthecovariancematrixexplicitly.
Instead,themostcomputationallydemandingstepsarethoseinvolvingsumsover
thedatasetthatare 0 (NDM).ForlargeD,andM« D,thiscanbea significant
savingcomparedto 0 (ND^2 )andcanoffsettheiterativenatureoftheEMalgorithm.
NotethatthisEMalgorithmcanbeimplementedinanon-lineforminwhich
eachD-dimensionaldatapointisreadinandprocessedandthendiscardedbefore
thenextdatapointisconsidered. Toseethis,notethatthequantitiesevaluatedin
theE step(anM-dimensionalvectorandanM x Mmatrix)canbecomputedfor
eachdatapointseparately,andintheMstepweneedtoaccumulatesumsoverdata
points,whichwecandoincrementally.Thisapproachcanbeadvantageousifboth
NandDarelarge.
Becausewenowhavea fullyprobabilisticmodelforPCA,wecandealwith
missingdata,providedthatitismissingatrandom,bymarginalizingoverthedis-
tributionoftheunobservedvariables. Againthesemissingvaluescanbetreated
usingtheEMalgorithm. Wegiveanexampleoftheuseofthisapproachfordata
visualizationinFigure12.11.
AnotherelegantfeatureoftheEMapproachisthatwecantakethelimita^2 ----t0,
correspondingtostandardPCA,andstillobtaina validEM-likealgorithm(Roweis,
1998).From(12.55),weseethattheonlyquantityweneedtocomputeintheEstep
isJE[zn].Furthermore,theMstepissimplifie~becauseM =WTW.Toemphasize
thesimplicityofthealgorithm,letusdefineXtobea matrixofsizeN xDwhose


nthrowisgivenbythevectorXn- xandsimilarlydefine 0 tobea matrixofsize


Dx Mwhosenthrowis givenbythevectorJE[zn]. TheEstep(12.54)oftheEM
algorithmforPCAthenbecomes

o= (W~dWold)-lW~dX


andtheMstep(12.56)takestheform

Wnew= XTOT(OOT)-l. (12.59)

Againthesecanbeimplementedinanon-lineform. Theseequationshavea simple
interpretationasfollows.Fromourearlierdiscussion,weseethattheE stepinvolves
anorthogonalprojectionofthedatapointsontothecurrentestimatefortheprincipal
subspace. Correspondingly,theMsteprepresentsa re-estimationoftheprincipal
Free download pdf