Pattern Recognition and Machine Learning

(Jeff_L) #1
12.4.NonlinearLatentVariableModels 593

Figure12.18 AnautoassociativemUltilayerperceptronhaving
twolayersof weights.Sucha networkis trainedto
mapinputvectorsontothemselvesbyminimiza-
tionota sum-ot-squareserror. Evenwithnon-
linearunitsinthehiddenlayer,sucha network
isequivalenttolinearprincipalcomponentanal-
ysis. Linksrepresentingbiasparametershave
beenomittedforclarity.


inputs outputs

(12.91)

fortheinputvariables. However,neuralnetworkshavealsobeenappliedtoun-
supervisedlearningwheretheyhavebeenusedfordimensionalityreduction.This
isachievedbyusinga networkhavingthesamenumberofoutputsasinputs,and
optimizingtheweightssoastominimizesomemeasureofthereconstructionerror
betweeninputsandoutputswithrespecttoa setoftrainingdata.
Considerfirsta multilayerperceptronoftheformshowninFigure12.18,hav-

ingDinputs,DoutputunitsandM hiddenunits,withM <D.Thetargetsused


totrainthenetworkaresimplytheinputvectorsthemselves,sothatthenetwork
isattemptingtomapeachinputvectorontoitself.Sucha networkissaidtoform
anautoassociativemapping. Sincethenumberofhiddenunitsissmallerthanthe
numberofinputs,a perfectreconstructionofallinputvectorsis notingeneralpos-
sible. Wethereforedeterminethenetworkparametersw byminimizinganerror
functionwhichcapturesthedegreeofmismatchbetweentheinputvectorsandtheir
reconstructions.Inparticular,weshallchoosea sum-of-squareserroroftheform

1 N
E(w)= "2L Ily(xn,w)- xn 112 •
n=l
Ifthehiddenunitshavelinearactivationsfunctions, thenit canbeshownthatthe
errorfunctionhasa uniqueglobalminimum,andthatatthisminimumthenetwork
performsa projectionontotheM-dimensionalsubspacewhichis spannedbythefirst
Mprincipalcomponentsofthedata(BourlardandKamp,1988;BaldiandHornik,
1989).Thus,thevectorsofweightswhichleadintothehiddenunitsinFigure12.18
forma basissetwhichspanstheprincipalsubspace.Note,however,thatthesevec-
torsneednotbeorthogonalornormalized. Thisresultisunsurprising,sinceboth
principalcomponentanalysisandtheneuralnetworkareusinglineardimensionality
reductionandareminimizingthesamesum-of-squareserrorfunction.
It mightbethoughtthatthelimitationsofa lineardimensionalityreductioncould
beovercomebyusingnonlinear(sigmoidal)activationfunctionsforthehiddenunits
inthenetworkinFigure12.18.However,evenwithnonlinearhiddenunits,themin-
imum errorsolutionisagaingivenbytheprojectionontotheprincipalcomponent
subspace(BourlardandKamp,1988).Thereis thereforenoadvantageinusingtwo-
layerneuralnetworkstoperformdimensionalityreduction.Standardtechniquesfor
principalcomponentanalysis(basedonsingularvaluedecomposition)areguaran-
teedtogivethecorrectsolutioninfinitetime,andtheyalsogenerateanorderedset
ofeigenvalueswithcorrespondingorthonormaleigenvectors.
Free download pdf