Pattern Recognition and Machine Learning

(Jeff_L) #1

592 12.CONTINUOUSLATENTVARIABLES


Exercise 12.29


likelihoodfunctionforthismodelis a functionofthecoefficientsinthelinearcom-
bination. Theloglikelihoodcanbemaximizedusinggradient-basedoptimization
givingrisetoa particularversionofindependentcomponentanalysis.
Thesuccessofthisapproachrequiresthatthelatentvariableshavenon-Gaussian
distributions.Toseethis,recallthatinprobabilisticPCA(andinfactoranalysis)the
latent-spacedistributionisgivenbya zero-meanisotropicGaussian. Themodel
thereforecannotdistinguishbetweentwodifferentchoicesforthelatentvariables
wherethesediffersimplybya rotationinlatentspace.Thiscanbeverifieddirectly
bynotingthatthemarginaldensity(12.35),andhencethelikelihoodfunction,is

unchangedifwemakethetransformationW -)WRwhereRisanorthogonal


matrixsatisfyingRRT =I,becausethematrixCgivenby(12.36)is itselfinvariant.


ExtendingthemodeltoallowmoregeneralGaussianlatentdistributionsdoesnot
changethisconclusionbecause,aswehaveseen,sucha modelisequivalenttothe
zero-meanisotropicGaussianlatentvariablemodel.
Anotherwaytoseewhya Gaussianlatentvariabledistributionina linearmodel
isinsufficienttofindindependentcomponentsistonotethattheprincipalcompo-
nentsrepresenta rotationofthecoordinatesystemindataspacesuchastodiagonal-
izethecovariancematrix,sothatthedatadistributioninthenewcoordinatesis then
uncorrelated. Althoughzerocorrelationisa necessaryconditionforindependence
it isnot,however,sufficient. Inpractice,a commonchoiceforthelatent-variable
distributionis givenby

1

p(z)= --,.-----,-


J 7fcosh(zj)

1

(12.90)

whichhasheavytailscomparedtoa Gaussian,reflectingtheobservationthatmany
real-worlddistributionsalsoexhibitthisproperty.
TheoriginalICAmodel(BellandSejnowski,1995)wasbasedontheoptimiza-
tionofanobjectivefunctiondefinedbyinformationmaximization.Oneadvantage
ofa probabilisticlatentvariableformulationisthatit helpstomotivateandformu-
lategeneralizationsofbasicICA.Forinstance,independentfactoranalysis(Attias,
1999a)considersa modelinwhichthenumberoflatentandobservedvariablescan
differ,theobservedvariablesarenoisy,andtheindividuallatentvariableshaveflex-
ibledistributionsmodelledbymixturesofGaussians. Theloglikelihoodforthis
modelismaximizedusingEM,andthereconstructionofthelatentvariablesisap-
proximatedusinga variationalapproach. Manyothertypesofmodelhavebeen
considered,andthereisnowa hugeliteratureonICAanditsapplications(Jutten
andHerault,1991;Comonetat.,1991;Amarietat., 1996;PearlmutterandParra,
1997;HyvarinenandOja,1997;Hintonetat., 2001;MiskinandMacKay,2001;
Hojen-Sorensenet at.,2002;ChoudreyandRoberts,2003;Chanet at.,2003;Stone,
2004).

12.4.2 Autoassociative neural networks


InChapter5 weconsideredneuralnetworksinthecontextofsupervisedlearn-
ing,wheretheroleofthenetworkistopredicttheoutputvariablesgivenvalues
Free download pdf