Understanding Machine Learning: From Theory to Algorithms

278 Neural Networks

SGD for Neural Networks

parameters: number of iterationsτ step size sequenceη 1 ,η 2 ,...,ητ regularization parameterλ > 0 input: layered graph (V,E) differentiable activation functionσ:R→R initialize: choosew(1)∈R|E|at random (from a distribution s.t.w(1)is close enough to 0 ) fori= 1, 2 ,...,τ sample (x,y)∼D calculate gradientvi=backpropagation(x,y,w,(V,E),σ) updatew(i+1)=w(i)−ηi(vi+λw(i)) output: w ̄is the best performingw(i)on a validation set

Backpropagation

input: example (x,y), weight vectorw, layered graph (V,E), activation functionσ:R→R initialize: denote layers of the graphV 0 ,...,VTwhereVt={vt, 1 ,...,vt,kt} defineWt,i,jas the weight of (vt,j,vt+1,i) (where we setWt,i,j= 0 if (vt,j,vt+1,i)∈/E) forward: seto 0 =x fort= 1,...,T fori= 1,...,kt setat,i=

∑kt− 1 j=1Wt−^1 ,i,jot−^1 ,j setot,i=σ(at,i) backward: setδT=oT−y fort=T− 1 ,T− 2 ,..., 1 fori= 1,...,kt δt,i=

∑kt+1 j=1Wt,j,iδt+1,jσ′(at+1,j) output: foreach edge (vt− 1 ,j,vt,i)∈E set the partial derivative toδt,iσ′(at,i)ot− 1 ,j

Understanding Machine Learning: From Theory to Algorithms

Get our desktop app

Company

Features

Documentation

Resources