Understanding Machine Learning: From Theory to Algorithms

270 Neural Networks

and ot+1,j(x) =σ(at+1,j(x)).

That is, the input tovt+1,jis a weighted sum of the outputs of the neurons inVt that are connected tovt+1,j, where weighting is according tow, and the output ofvt+1,jis simply the application of the activation functionσon its input. LayersV 1 ,...,VT− 1 are often calledhidden layers. The top layer,VT, is called the output layer. In simple prediction problems the output layer contains a single neuron whose output is the output of the network. We refer toTas the number of layers in the network (excludingV 0 ), or the “depth” of the network. The size of the network is|V|. The “width” of the network is maxt|Vt|. An illustration of a layered feedforward neural network of depth 2, size 10, and width 5, is given in the following. Note that there is a neuron in the hidden layer that has no incoming edges. This neuron will output the constantσ(0).

x 1 v 0 , 1

x 2 v 0 , 2

x 3 v 0 , 3

constant v^0 ,^4

v 1 , 1

v 1 , 2

v 1 , 3

v 1 , 4

v 1 , 5

v 2 , 1 Output

Hidden layer (V 1 )

Input layer (V 0 )

Output layer (V 2 )

20.2 Learning Neural Networks

Once we have specified a neural network by (V,E,σ,w), we obtain a function hV,E,σ,w:R|V^0 |−^1 →R|VT|. Any set of such functions can serve as a hypothesis class for learning. Usually, we define a hypothesis class of neural network predic- tors by fixing the graph (V,E) as well as the activation functionσand letting the hypothesis class be all functions of the formhV,E,σ,wfor somew:E→R. The triplet (V,E,σ) is often called thearchitectureof the network. We denote the hypothesis class by

HV,E,σ ={hV,E,σ,w :wis a mapping fromEtoR}. (20.1)

Understanding Machine Learning: From Theory to Algorithms

20.2 Learning Neural Networks

Get our desktop app

Company

Features

Documentation

Resources