Pattern Recognition and Machine Learning

(Jeff_L) #1
12.2.ProbabilisticpeA 575

Again,weshallassumethattheeigenvectorshavebeenarrangedinorderofdecreas-
ingvaluesofthecorrespondingeigenvalues,sothattheMprincipaleigenvectorsare

Ul,"" UM.Inthiscase,thecolumnsofW definetheprincipalsubspaceofstan-


dardPCA.Thecorrespondingmaximumlikelihoodsolutionfor(J'2is thengivenby

1 D
(J'~L= D-M L Ai
i=M+l

(12.46)

Section12.2.2


sothat(J'~Lis theaveragevarianceassociatedwiththediscardeddimensions.
BecauseRis orthogonal,it canbeinterpretedasa rotationmatrixintheM x M
latentspace.IfwesubstitutethesolutionforW intotheexpressionforC,andmake

useoftheorthogonalitypropertyRRT = I,weseethatCisindependentofR.


Thissimplysaysthatthepredictivedensityisunchangedbyrotationsinthelatent
spaceasdiscussedearlier.FortheparticularcaseofR= I,weseethatthecolumns
ofW aretheprincipalcomponenteigenvectorsscaledbythevarianceparameters
Ai- (J'2. Theinterpretationofthesescalingfactorsisclearoncewerecognizethat
fora convolutionofindependentGaussiandistributions(in thiscasethelatentspace
distributionandthenoisemodel)thevariancesareadditive. ThusthevarianceAi
inthedirectionofaneigenvectorUiiscomposedofthesumofa contributionAi-
(J'2fromtheprojectionoftheunit-variancelatentspacedistributionintodataspace

throughthecorrespondingcolumnofW,plusanisotropiccontributionofvariance


(J'2whichis addedinalldirectionsbythenoisemodel.
Itisworthtakinga momenttostudytheformofthecovariancematrixgiven
by(12.36).Considerthevarianceofthepredictivedistributionalongsomedirection
specifiedbytheunitvectorv,wherevTv= 1,whichisgivenbyvTCv. First
supposethatvisorthogonaltotheprincipalsubspace,inotherwordsit isgivenby

somelinearcombinationofthediscardedeigenvectors. ThenvTV=0 and hence


v TCv= (J'2. Thusthemodelpredictsa noisevarianceorthogonaltotheprincipal


subspace,which,from(12.46),isjusttheaverageofthediscardedeigenvalues.Now
supposethatv= UiwhereUiisoneoftheretainedeigenvectorsdefiningtheprin-
cipalsubspace. ThenvTCv= (Ai - (J'2)+(J'2= Ai. Inotherwords,thismodel
correctlycapturesthevarianceofthedataalongtheprincipalaxes,andapproximates
thevarianceinallremainingdirectionswitha singleaveragevalue(J'2.
Onewaytoconstructthemaximumlikelihooddensitymodelwouldsimplybe
tofindtheeigenvectorsandeigenvaluesofthedatacovariancematrixandthento

evaluateWand(J'2usingtheresultsgivenabove. Inthiscase,wewouldchoose


R = I forconvenience.However,ifthemaximumlikelihoodsolutionisfoundby
numericaloptimizationofthelikelihoodfunction,forinstanceusinganalgorithm
suchasconjugategradients(Fletcher,1987;NocedalandWright,1999;Bishopand
Nabney,2008)orthroughtheEMalgorithm,thentheresultingvalueofRises-

sentiallyarbitrary.ThisimpliesthatthecolumnsofW neednotbeorthogonal.If


anorthogonalbasisisrequired,thematrixW canbepost-processedappropriately
(GolubandVanLoan,1996). Alternatively,theEMalgorithmcanbemodifiedin
sucha wayastoyieldorthonormalprincipaldirections,sortedindescendingorder
ofthecorrespondingeigenvalues,directly(AhnandOh,2003).
Free download pdf