Pattern Recognition and Machine Learning

12.4.NonlinearLatentVariableModels 593

Figure12.18 AnautoassociativemUltilayerperceptronhaving
twolayersof weights.Sucha networkis trainedto
mapinputvectorsontothemselvesbyminimiza-
tionota sum-ot-squareserror. Evenwithnon-
linearunitsinthehiddenlayer,sucha network
isequivalenttolinearprincipalcomponentanal-
ysis. Linksrepresentingbiasparametershave
beenomittedforclarity.

inputs outputs

(12.91)

fortheinputvariables. However,neuralnetworkshavealsobeenappliedtoun- supervisedlearningwheretheyhavebeenusedfordimensionalityreduction.This isachievedbyusinga networkhavingthesamenumberofoutputsasinputs,and optimizingtheweightssoastominimizesomemeasureofthereconstructionerror betweeninputsandoutputswithrespecttoa setoftrainingdata. Considerfirsta multilayerperceptronoftheformshowninFigure12.18,hav-

ingDinputs,DoutputunitsandM hiddenunits,withM <D.Thetargetsused

totrainthenetworkaresimplytheinputvectorsthemselves,sothatthenetwork isattemptingtomapeachinputvectorontoitself.Sucha networkissaidtoform anautoassociativemapping. Sincethenumberofhiddenunitsissmallerthanthe numberofinputs,a perfectreconstructionofallinputvectorsis notingeneralpos- sible. Wethereforedeterminethenetworkparametersw byminimizinganerror functionwhichcapturesthedegreeofmismatchbetweentheinputvectorsandtheir reconstructions.Inparticular,weshallchoosea sum-of-squareserroroftheform

1 N E(w)= "2L Ily(xn,w)- xn 112 • n=l Ifthehiddenunitshavelinearactivationsfunctions, thenit canbeshownthatthe errorfunctionhasa uniqueglobalminimum,andthatatthisminimumthenetwork performsa projectionontotheM-dimensionalsubspacewhichis spannedbythefirst Mprincipalcomponentsofthedata(BourlardandKamp,1988;BaldiandHornik, 1989).Thus,thevectorsofweightswhichleadintothehiddenunitsinFigure12.18 forma basissetwhichspanstheprincipalsubspace.Note,however,thatthesevec- torsneednotbeorthogonalornormalized. Thisresultisunsurprising,sinceboth principalcomponentanalysisandtheneuralnetworkareusinglineardimensionality reductionandareminimizingthesamesum-of-squareserrorfunction. It mightbethoughtthatthelimitationsofa lineardimensionalityreductioncould beovercomebyusingnonlinear(sigmoidal)activationfunctionsforthehiddenunits inthenetworkinFigure12.18.However,evenwithnonlinearhiddenunits,themin- imum errorsolutionisagaingivenbytheprojectionontotheprincipalcomponent subspace(BourlardandKamp,1988).Thereis thereforenoadvantageinusingtwo- layerneuralnetworkstoperformdimensionalityreduction.Standardtechniquesfor principalcomponentanalysis(basedonsingularvaluedecomposition)areguaran- teedtogivethecorrectsolutioninfinitetime,andtheyalsogenerateanorderedset ofeigenvalueswithcorrespondingorthonormaleigenvectors.

Pattern Recognition and Machine Learning

(12.91)

ingDinputs,DoutputunitsandM hiddenunits,withM <D.Thetargetsused

Get our desktop app

Company

Features

Documentation

Resources