592 12.CONTINUOUSLATENTVARIABLES
Exercise 12.29
likelihoodfunctionforthismodelis a functionofthecoefficientsinthelinearcom-
bination. Theloglikelihoodcanbemaximizedusinggradient-basedoptimization
givingrisetoa particularversionofindependentcomponentanalysis.
Thesuccessofthisapproachrequiresthatthelatentvariableshavenon-Gaussian
distributions.Toseethis,recallthatinprobabilisticPCA(andinfactoranalysis)the
latent-spacedistributionisgivenbya zero-meanisotropicGaussian. Themodel
thereforecannotdistinguishbetweentwodifferentchoicesforthelatentvariables
wherethesediffersimplybya rotationinlatentspace.Thiscanbeverifieddirectly
bynotingthatthemarginaldensity(12.35),andhencethelikelihoodfunction,isunchangedifwemakethetransformationW -)WRwhereRisanorthogonal
matrixsatisfyingRRT =I,becausethematrixCgivenby(12.36)is itselfinvariant.
ExtendingthemodeltoallowmoregeneralGaussianlatentdistributionsdoesnot
changethisconclusionbecause,aswehaveseen,sucha modelisequivalenttothe
zero-meanisotropicGaussianlatentvariablemodel.
Anotherwaytoseewhya Gaussianlatentvariabledistributionina linearmodel
isinsufficienttofindindependentcomponentsistonotethattheprincipalcompo-
nentsrepresenta rotationofthecoordinatesystemindataspacesuchastodiagonal-
izethecovariancematrix,sothatthedatadistributioninthenewcoordinatesis then
uncorrelated. Althoughzerocorrelationisa necessaryconditionforindependence
it isnot,however,sufficient. Inpractice,a commonchoiceforthelatent-variable
distributionis givenby1p(z)= --,.-----,-
J 7fcosh(zj)1
(12.90)
whichhasheavytailscomparedtoa Gaussian,reflectingtheobservationthatmany
real-worlddistributionsalsoexhibitthisproperty.
TheoriginalICAmodel(BellandSejnowski,1995)wasbasedontheoptimiza-
tionofanobjectivefunctiondefinedbyinformationmaximization.Oneadvantage
ofa probabilisticlatentvariableformulationisthatit helpstomotivateandformu-
lategeneralizationsofbasicICA.Forinstance,independentfactoranalysis(Attias,
1999a)considersa modelinwhichthenumberoflatentandobservedvariablescan
differ,theobservedvariablesarenoisy,andtheindividuallatentvariableshaveflex-
ibledistributionsmodelledbymixturesofGaussians. Theloglikelihoodforthis
modelismaximizedusingEM,andthereconstructionofthelatentvariablesisap-
proximatedusinga variationalapproach. Manyothertypesofmodelhavebeen
considered,andthereisnowa hugeliteratureonICAanditsapplications(Jutten
andHerault,1991;Comonetat.,1991;Amarietat., 1996;PearlmutterandParra,
1997;HyvarinenandOja,1997;Hintonetat., 2001;MiskinandMacKay,2001;
Hojen-Sorensenet at.,2002;ChoudreyandRoberts,2003;Chanet at.,2003;Stone,
2004).12.4.2 Autoassociative neural networks
InChapter5 weconsideredneuralnetworksinthecontextofsupervisedlearn-
ing,wheretheroleofthenetworkistopredicttheoutputvariablesgivenvalues