270 Neural Networks
and
ot+1,j(x) =σ(at+1,j(x)).
That is, the input tovt+1,jis a weighted sum of the outputs of the neurons inVt
that are connected tovt+1,j, where weighting is according tow, and the output
ofvt+1,jis simply the application of the activation functionσon its input.
LayersV 1 ,...,VT− 1 are often calledhidden layers. The top layer,VT, is called
the output layer. In simple prediction problems the output layer contains a single
neuron whose output is the output of the network.
We refer toTas the number of layers in the network (excludingV 0 ), or the
“depth” of the network. The size of the network is|V|. The “width” of the
network is maxt|Vt|. An illustration of a layered feedforward neural network of
depth 2, size 10, and width 5, is given in the following. Note that there is a
neuron in the hidden layer that has no incoming edges. This neuron will output
the constantσ(0).
x 1 v 0 , 1
x 2 v 0 , 2
x 3 v 0 , 3
constant v^0 ,^4
v 1 , 1
v 1 , 2
v 1 , 3
v 1 , 4
v 1 , 5
v 2 , 1 Output
Hidden
layer
(V 1 )
Input
layer
(V 0 )
Output
layer
(V 2 )
20.2 Learning Neural Networks
Once we have specified a neural network by (V,E,σ,w), we obtain a function
hV,E,σ,w:R|V^0 |−^1 →R|VT|. Any set of such functions can serve as a hypothesis
class for learning. Usually, we define a hypothesis class of neural network predic-
tors by fixing the graph (V,E) as well as the activation functionσand letting
the hypothesis class be all functions of the formhV,E,σ,wfor somew:E→R.
The triplet (V,E,σ) is often called thearchitectureof the network. We denote
the hypothesis class by
HV,E,σ ={hV,E,σ,w :wis a mapping fromEtoR}. (20.1)