12.4.NonlinearLatentVariableModels 593
Figure12.18 AnautoassociativemUltilayerperceptronhaving
twolayersof weights.Sucha networkis trainedto
mapinputvectorsontothemselvesbyminimiza-
tionota sum-ot-squareserror. Evenwithnon-
linearunitsinthehiddenlayer,sucha network
isequivalenttolinearprincipalcomponentanal-
ysis. Linksrepresentingbiasparametershave
beenomittedforclarity.
inputs outputs
(12.91)
fortheinputvariables. However,neuralnetworkshavealsobeenappliedtoun-
supervisedlearningwheretheyhavebeenusedfordimensionalityreduction.This
isachievedbyusinga networkhavingthesamenumberofoutputsasinputs,and
optimizingtheweightssoastominimizesomemeasureofthereconstructionerror
betweeninputsandoutputswithrespecttoa setoftrainingdata.
Considerfirsta multilayerperceptronoftheformshowninFigure12.18,hav-
ingDinputs,DoutputunitsandM hiddenunits,withM <D.Thetargetsused
totrainthenetworkaresimplytheinputvectorsthemselves,sothatthenetwork
isattemptingtomapeachinputvectorontoitself.Sucha networkissaidtoform
anautoassociativemapping. Sincethenumberofhiddenunitsissmallerthanthe
numberofinputs,a perfectreconstructionofallinputvectorsis notingeneralpos-
sible. Wethereforedeterminethenetworkparametersw byminimizinganerror
functionwhichcapturesthedegreeofmismatchbetweentheinputvectorsandtheir
reconstructions.Inparticular,weshallchoosea sum-of-squareserroroftheform
1 N
E(w)= "2L Ily(xn,w)- xn 112 •
n=l
Ifthehiddenunitshavelinearactivationsfunctions, thenit canbeshownthatthe
errorfunctionhasa uniqueglobalminimum,andthatatthisminimumthenetwork
performsa projectionontotheM-dimensionalsubspacewhichis spannedbythefirst
Mprincipalcomponentsofthedata(BourlardandKamp,1988;BaldiandHornik,
1989).Thus,thevectorsofweightswhichleadintothehiddenunitsinFigure12.18
forma basissetwhichspanstheprincipalsubspace.Note,however,thatthesevec-
torsneednotbeorthogonalornormalized. Thisresultisunsurprising,sinceboth
principalcomponentanalysisandtheneuralnetworkareusinglineardimensionality
reductionandareminimizingthesamesum-of-squareserrorfunction.
It mightbethoughtthatthelimitationsofa lineardimensionalityreductioncould
beovercomebyusingnonlinear(sigmoidal)activationfunctionsforthehiddenunits
inthenetworkinFigure12.18.However,evenwithnonlinearhiddenunits,themin-
imum errorsolutionisagaingivenbytheprojectionontotheprincipalcomponent
subspace(BourlardandKamp,1988).Thereis thereforenoadvantageinusingtwo-
layerneuralnetworkstoperformdimensionalityreduction.Standardtechniquesfor
principalcomponentanalysis(basedonsingularvaluedecomposition)areguaran-
teedtogivethecorrectsolutioninfinitetime,andtheyalsogenerateanorderedset
ofeigenvalueswithcorrespondingorthonormaleigenvectors.