Pattern Recognition and Machine Learning

12.2.ProbabilisticpeA 575

Again,weshallassumethattheeigenvectorshavebeenarrangedinorderofdecreas- ingvaluesofthecorrespondingeigenvalues,sothattheMprincipaleigenvectorsare

Ul,"" UM.Inthiscase,thecolumnsofW definetheprincipalsubspaceofstan-

dardPCA.Thecorrespondingmaximumlikelihoodsolutionfor(J'2is thengivenby

1 D (J'~L= D-M L Ai i=M+l

(12.46)

Section12.2.2

sothat(J'~Lis theaveragevarianceassociatedwiththediscardeddimensions. BecauseRis orthogonal,it canbeinterpretedasa rotationmatrixintheM x M latentspace.IfwesubstitutethesolutionforW intotheexpressionforC,andmake

useoftheorthogonalitypropertyRRT = I,weseethatCisindependentofR.

Thissimplysaysthatthepredictivedensityisunchangedbyrotationsinthelatent spaceasdiscussedearlier.FortheparticularcaseofR= I,weseethatthecolumns ofW aretheprincipalcomponenteigenvectorsscaledbythevarianceparameters Ai- (J'2. Theinterpretationofthesescalingfactorsisclearoncewerecognizethat fora convolutionofindependentGaussiandistributions(in thiscasethelatentspace distributionandthenoisemodel)thevariancesareadditive. ThusthevarianceAi inthedirectionofaneigenvectorUiiscomposedofthesumofa contributionAi- (J'2fromtheprojectionoftheunit-variancelatentspacedistributionintodataspace

throughthecorrespondingcolumnofW,plusanisotropiccontributionofvariance

(J'2whichis addedinalldirectionsbythenoisemodel. Itisworthtakinga momenttostudytheformofthecovariancematrixgiven by(12.36).Considerthevarianceofthepredictivedistributionalongsomedirection specifiedbytheunitvectorv,wherevTv= 1,whichisgivenbyvTCv. First supposethatvisorthogonaltotheprincipalsubspace,inotherwordsit isgivenby

somelinearcombinationofthediscardedeigenvectors. ThenvTV=0 and hence

v TCv= (J'2. Thusthemodelpredictsa noisevarianceorthogonaltotheprincipal

subspace,which,from(12.46),isjusttheaverageofthediscardedeigenvalues.Now supposethatv= UiwhereUiisoneoftheretainedeigenvectorsdefiningtheprin- cipalsubspace. ThenvTCv= (Ai - (J'2)+(J'2= Ai. Inotherwords,thismodel correctlycapturesthevarianceofthedataalongtheprincipalaxes,andapproximates thevarianceinallremainingdirectionswitha singleaveragevalue(J'2. Onewaytoconstructthemaximumlikelihooddensitymodelwouldsimplybe tofindtheeigenvectorsandeigenvaluesofthedatacovariancematrixandthento

evaluateWand(J'2usingtheresultsgivenabove. Inthiscase,wewouldchoose

R = I forconvenience.However,ifthemaximumlikelihoodsolutionisfoundby numericaloptimizationofthelikelihoodfunction,forinstanceusinganalgorithm suchasconjugategradients(Fletcher,1987;NocedalandWright,1999;Bishopand Nabney,2008)orthroughtheEMalgorithm,thentheresultingvalueofRises-

sentiallyarbitrary.ThisimpliesthatthecolumnsofW neednotbeorthogonal.If

anorthogonalbasisisrequired,thematrixW canbepost-processedappropriately (GolubandVanLoan,1996). Alternatively,theEMalgorithmcanbemodifiedin sucha wayastoyieldorthonormalprincipaldirections,sortedindescendingorder ofthecorrespondingeigenvalues,directly(AhnandOh,2003).

Pattern Recognition and Machine Learning

Ul,"" UM.Inthiscase,thecolumnsofW definetheprincipalsubspaceofstan-

(12.46)

useoftheorthogonalitypropertyRRT = I,weseethatCisindependentofR.

throughthecorrespondingcolumnofW,plusanisotropiccontributionofvariance

somelinearcombinationofthediscardedeigenvectors. ThenvTV=0 and hence

v TCv= (J'2. Thusthemodelpredictsa noisevarianceorthogonaltotheprincipal

evaluateWand(J'2usingtheresultsgivenabove. Inthiscase,wewouldchoose

sentiallyarbitrary.ThisimpliesthatthecolumnsofW neednotbeorthogonal.If

Get our desktop app

Company

Features

Documentation

Resources