Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

270 Neural Networks


and
ot+1,j(x) =σ(at+1,j(x)).

That is, the input tovt+1,jis a weighted sum of the outputs of the neurons inVt
that are connected tovt+1,j, where weighting is according tow, and the output
ofvt+1,jis simply the application of the activation functionσon its input.
LayersV 1 ,...,VT− 1 are often calledhidden layers. The top layer,VT, is called
the output layer. In simple prediction problems the output layer contains a single
neuron whose output is the output of the network.
We refer toTas the number of layers in the network (excludingV 0 ), or the
“depth” of the network. The size of the network is|V|. The “width” of the
network is maxt|Vt|. An illustration of a layered feedforward neural network of
depth 2, size 10, and width 5, is given in the following. Note that there is a
neuron in the hidden layer that has no incoming edges. This neuron will output
the constantσ(0).

x 1 v 0 , 1

x 2 v 0 , 2

x 3 v 0 , 3

constant v^0 ,^4

v 1 , 1

v 1 , 2

v 1 , 3

v 1 , 4

v 1 , 5

v 2 , 1 Output

Hidden
layer
(V 1 )

Input
layer
(V 0 )

Output
layer
(V 2 )

20.2 Learning Neural Networks


Once we have specified a neural network by (V,E,σ,w), we obtain a function
hV,E,σ,w:R|V^0 |−^1 →R|VT|. Any set of such functions can serve as a hypothesis
class for learning. Usually, we define a hypothesis class of neural network predic-
tors by fixing the graph (V,E) as well as the activation functionσand letting
the hypothesis class be all functions of the formhV,E,σ,wfor somew:E→R.
The triplet (V,E,σ) is often called thearchitectureof the network. We denote
the hypothesis class by

HV,E,σ ={hV,E,σ,w :wis a mapping fromEtoR}. (20.1)
Free download pdf