12.2.ProbabilisticpeA 575
Again,weshallassumethattheeigenvectorshavebeenarrangedinorderofdecreas-
ingvaluesofthecorrespondingeigenvalues,sothattheMprincipaleigenvectorsare
Ul,"" UM.Inthiscase,thecolumnsofW definetheprincipalsubspaceofstan-
dardPCA.Thecorrespondingmaximumlikelihoodsolutionfor(J'2is thengivenby
1 D
(J'~L= D-M L Ai
i=M+l
(12.46)
Section12.2.2
sothat(J'~Lis theaveragevarianceassociatedwiththediscardeddimensions.
BecauseRis orthogonal,it canbeinterpretedasa rotationmatrixintheM x M
latentspace.IfwesubstitutethesolutionforW intotheexpressionforC,andmake
useoftheorthogonalitypropertyRRT = I,weseethatCisindependentofR.
Thissimplysaysthatthepredictivedensityisunchangedbyrotationsinthelatent
spaceasdiscussedearlier.FortheparticularcaseofR= I,weseethatthecolumns
ofW aretheprincipalcomponenteigenvectorsscaledbythevarianceparameters
Ai- (J'2. Theinterpretationofthesescalingfactorsisclearoncewerecognizethat
fora convolutionofindependentGaussiandistributions(in thiscasethelatentspace
distributionandthenoisemodel)thevariancesareadditive. ThusthevarianceAi
inthedirectionofaneigenvectorUiiscomposedofthesumofa contributionAi-
(J'2fromtheprojectionoftheunit-variancelatentspacedistributionintodataspace
throughthecorrespondingcolumnofW,plusanisotropiccontributionofvariance
(J'2whichis addedinalldirectionsbythenoisemodel.
Itisworthtakinga momenttostudytheformofthecovariancematrixgiven
by(12.36).Considerthevarianceofthepredictivedistributionalongsomedirection
specifiedbytheunitvectorv,wherevTv= 1,whichisgivenbyvTCv. First
supposethatvisorthogonaltotheprincipalsubspace,inotherwordsit isgivenby
somelinearcombinationofthediscardedeigenvectors. ThenvTV=0 and hence
v TCv= (J'2. Thusthemodelpredictsa noisevarianceorthogonaltotheprincipal
subspace,which,from(12.46),isjusttheaverageofthediscardedeigenvalues.Now
supposethatv= UiwhereUiisoneoftheretainedeigenvectorsdefiningtheprin-
cipalsubspace. ThenvTCv= (Ai - (J'2)+(J'2= Ai. Inotherwords,thismodel
correctlycapturesthevarianceofthedataalongtheprincipalaxes,andapproximates
thevarianceinallremainingdirectionswitha singleaveragevalue(J'2.
Onewaytoconstructthemaximumlikelihooddensitymodelwouldsimplybe
tofindtheeigenvectorsandeigenvaluesofthedatacovariancematrixandthento
evaluateWand(J'2usingtheresultsgivenabove. Inthiscase,wewouldchoose
R = I forconvenience.However,ifthemaximumlikelihoodsolutionisfoundby
numericaloptimizationofthelikelihoodfunction,forinstanceusinganalgorithm
suchasconjugategradients(Fletcher,1987;NocedalandWright,1999;Bishopand
Nabney,2008)orthroughtheEMalgorithm,thentheresultingvalueofRises-
sentiallyarbitrary.ThisimpliesthatthecolumnsofW neednotbeorthogonal.If
anorthogonalbasisisrequired,thematrixW canbepost-processedappropriately
(GolubandVanLoan,1996). Alternatively,theEMalgorithmcanbemodifiedin
sucha wayastoyieldorthonormalprincipaldirections,sortedindescendingorder
ofthecorrespondingeigenvalues,directly(AhnandOh,2003).