Understanding Machine Learning: From Theory to Algorithms

20.3 The Expressive Power of Neural Networks 271

That is, the parameters specifying a hypothesis in the hypothesis class are the weights over the edges of the network. We can now study the approximation error, estimation error, and optimization error of such hypothesis classes. In Section 20.3 we study the approximation error ofHV,E,σby studying what type of functions hypotheses inHV,E,σcan implement, in terms of the size of the underlying graph. In Section 20.4 we study the estimation error ofHV,E,σ, for the case of binary classification (i.e., VT= 1 andσis the sign function), by analyzing its VC dimension. Finally, in Section 20.5 we show that it is computationally hard to learn the classHV,E,σ, even if the underlying graph is small, and in Section 20.6 we present the most commonly used heuristic for trainingHV,E,σ.

20.3 The Expressive Power of Neural Networks

In this section we study the expressive power of neural networks, namely, what type of functions can be implemented using a neural network. More concretely, we will fix some architecture,V,E,σ, and will study what functions hypotheses inHV,E,σcan implement, as a function of the size ofV. We start the discussion with studying which type of Boolean functions (i.e., functions from{± 1 }nto{± 1 }) can be implemented byHV,E,sign. Observe that for every computer in which real numbers are stored usingbbits, whenever we calculate a functionf :Rn→Ron such a computer we in fact calculate a functiong:{± 1 }nb→{± 1 }b. Therefore, studying which Boolean functions can be implemented byHV,E,signcan tell us which functions can be implemented on a computer that stores real numbers usingbbits. We begin with a simple claim, showing that without restricting the size of the network, every Boolean function can be implemented using a neural network of depth 2.

claim20.1 For everyn, there exists a graph(V,E)of depth 2 , such that HV,E,signcontains all functions from{± 1 }nto{± 1 }.

Proof We construct a graph with|V 0 |=n+ 1,|V 1 |= 2n+ 1,and|V 2 |= 1. Let Ebe all possible edges between adjacent layers. Now, letf:{± 1 }n→ {± 1 } be some Boolean function. We need to show that we can adjust the weights so that the network will implementf. Letu 1 ,...,ukbe all vectors in{± 1 }non whichfoutputs 1. Observe that for everyiand everyx∈ {± 1 }n, ifx 6 =ui then〈x,ui〉≤n−2 and ifx=uithen〈x,ui〉=n. It follows that the function gi(x) = sign(〈x,ui〉−n+ 1) equals 1 if and only ifx=ui. It follows that we can adapt the weights betweenV 0 andV 1 so that for everyi∈[k], the neuronv 1 ,i implements the functiongi(x). Next, we observe thatf(x) is the disjunction of

Understanding Machine Learning: From Theory to Algorithms

20.3 The Expressive Power of Neural Networks

Get our desktop app

Company

Features

Documentation

Resources