Pattern Recognition and Machine Learning

592 12.CONTINUOUSLATENTVARIABLES

Exercise 12.29

likelihoodfunctionforthismodelis a functionofthecoefficientsinthelinearcom- bination. Theloglikelihoodcanbemaximizedusinggradient-basedoptimization givingrisetoa particularversionofindependentcomponentanalysis. Thesuccessofthisapproachrequiresthatthelatentvariableshavenon-Gaussian distributions.Toseethis,recallthatinprobabilisticPCA(andinfactoranalysis)the latent-spacedistributionisgivenbya zero-meanisotropicGaussian. Themodel thereforecannotdistinguishbetweentwodifferentchoicesforthelatentvariables wherethesediffersimplybya rotationinlatentspace.Thiscanbeverifieddirectly bynotingthatthemarginaldensity(12.35),andhencethelikelihoodfunction,is

unchangedifwemakethetransformationW -)WRwhereRisanorthogonal

matrixsatisfyingRRT =I,becausethematrixCgivenby(12.36)is itselfinvariant.

ExtendingthemodeltoallowmoregeneralGaussianlatentdistributionsdoesnot changethisconclusionbecause,aswehaveseen,sucha modelisequivalenttothe zero-meanisotropicGaussianlatentvariablemodel. Anotherwaytoseewhya Gaussianlatentvariabledistributionina linearmodel isinsufficienttofindindependentcomponentsistonotethattheprincipalcompo- nentsrepresenta rotationofthecoordinatesystemindataspacesuchastodiagonal- izethecovariancematrix,sothatthedatadistributioninthenewcoordinatesis then uncorrelated. Althoughzerocorrelationisa necessaryconditionforindependence it isnot,however,sufficient. Inpractice,a commonchoiceforthelatent-variable distributionis givenby

1

p(z)= --,.-----,-

J 7fcosh(zj)

1

(12.90)

whichhasheavytailscomparedtoa Gaussian,reflectingtheobservationthatmany real-worlddistributionsalsoexhibitthisproperty. TheoriginalICAmodel(BellandSejnowski,1995)wasbasedontheoptimiza- tionofanobjectivefunctiondefinedbyinformationmaximization.Oneadvantage ofa probabilisticlatentvariableformulationisthatit helpstomotivateandformu- lategeneralizationsofbasicICA.Forinstance,independentfactoranalysis(Attias, 1999a)considersa modelinwhichthenumberoflatentandobservedvariablescan differ,theobservedvariablesarenoisy,andtheindividuallatentvariableshaveflex- ibledistributionsmodelledbymixturesofGaussians. Theloglikelihoodforthis modelismaximizedusingEM,andthereconstructionofthelatentvariablesisap- proximatedusinga variationalapproach. Manyothertypesofmodelhavebeen considered,andthereisnowa hugeliteratureonICAanditsapplications(Jutten andHerault,1991;Comonetat.,1991;Amarietat., 1996;PearlmutterandParra, 1997;HyvarinenandOja,1997;Hintonetat., 2001;MiskinandMacKay,2001; Hojen-Sorensenet at.,2002;ChoudreyandRoberts,2003;Chanet at.,2003;Stone, 2004).

12.4.2 Autoassociative neural networks

InChapter5 weconsideredneuralnetworksinthecontextofsupervisedlearn- ing,wheretheroleofthenetworkistopredicttheoutputvariablesgivenvalues

Pattern Recognition and Machine Learning

unchangedifwemakethetransformationW -)WRwhereRisanorthogonal

matrixsatisfyingRRT =I,becausethematrixCgivenby(12.36)is itselfinvariant.

p(z)= --,.-----,-

1

(12.90)

12.4.2 Autoassociative neural networks

Get our desktop app

Company

Features

Documentation

Resources