Pattern Recognition and Machine Learning

(12.58)

12.2.ProbabilisticpeA 579

eigenvectordecompositionofthesamplecovariancematrix, theEMapproachis
iterativeandsomightappeartobelessattractive. However,eachcycleoftheEM
algorithmcanbecomputationallymuchmoreefficientthanconventionalPCAin
spacesofhighdimensionality.Toseethis,wenotethattheeigendecompositionof
thecovariancematrixrequiresO(D^3 ) computation. Oftenweareinterestedonly
inthefirstMeigenvectorsandtheircorrespondingeigenvalues,inwhichcasewe
canusealgorithmsthatare 0 (MD^2 ). However,theevaluationofthecovariance
matrixitselftakes 0 (ND^2 )computations,whereNisthenumberofdatapoints.
Algorithmssuchasthesnapshotmethod(Sirovich,1987),whichassumethatthe
eigenvectorsarelinearcombinationsofthedatavectors,avoiddirectevaluationof

thecovariancematrixbutareO(N^3 )andhenceunsuitedtolargedatasets.TheEM

algorithmdescribedherealsodoesnotconstructthecovariancematrixexplicitly.
Instead,themostcomputationallydemandingstepsarethoseinvolvingsumsover
thedatasetthatare 0 (NDM).ForlargeD,andM« D,thiscanbea significant
savingcomparedto 0 (ND^2 )andcanoffsettheiterativenatureoftheEMalgorithm.
NotethatthisEMalgorithmcanbeimplementedinanon-lineforminwhich
eachD-dimensionaldatapointisreadinandprocessedandthendiscardedbefore
thenextdatapointisconsidered. Toseethis,notethatthequantitiesevaluatedin
theE step(anM-dimensionalvectorandanM x Mmatrix)canbecomputedfor
eachdatapointseparately,andintheMstepweneedtoaccumulatesumsoverdata
points,whichwecandoincrementally.Thisapproachcanbeadvantageousifboth
NandDarelarge.
Becausewenowhavea fullyprobabilisticmodelforPCA,wecandealwith
missingdata,providedthatitismissingatrandom,bymarginalizingoverthedis-
tributionoftheunobservedvariables. Againthesemissingvaluescanbetreated
usingtheEMalgorithm. Wegiveanexampleoftheuseofthisapproachfordata
visualizationinFigure12.11.
AnotherelegantfeatureoftheEMapproachisthatwecantakethelimita^2 ----t0,
correspondingtostandardPCA,andstillobtaina validEM-likealgorithm(Roweis,
1998).From(12.55),weseethattheonlyquantityweneedtocomputeintheEstep
isJE[zn].Furthermore,theMstepissimplifie~becauseM =WTW.Toemphasize
thesimplicityofthealgorithm,letusdefineXtobea matrixofsizeN xDwhose

nthrowisgivenbythevectorXn- xandsimilarlydefine 0 tobea matrixofsize

Dx Mwhosenthrowis givenbythevectorJE[zn]. TheEstep(12.54)oftheEM algorithmforPCAthenbecomes

o= (W~dWold)-lW~dX

andtheMstep(12.56)takestheform

Wnew= XTOT(OOT)-l. (12.59)

Againthesecanbeimplementedinanon-lineform. Theseequationshavea simple interpretationasfollows.Fromourearlierdiscussion,weseethattheE stepinvolves anorthogonalprojectionofthedatapointsontothecurrentestimatefortheprincipal subspace. Correspondingly,theMsteprepresentsa re-estimationoftheprincipal

Pattern Recognition and Machine Learning

(12.58)

thecovariancematrixbutareO(N^3 )andhenceunsuitedtolargedatasets.TheEM

nthrowisgivenbythevectorXn- xandsimilarlydefine 0 tobea matrixofsize

Get our desktop app

Company

Features

Documentation

Resources