172 CHAPTER 5. NEURAL NETWORKS FOR CONTROL
only one neuronC. For all neurons, the ìactivation functionîfis taken to be
f(x)=Ω
1 ifx≥ 0
0 otherwiseConsider, for example,x 1 =x 2 =1. We expect the network to produce the
outputy=0.Theoutputyis computed in several steps. The net input to
neuronAis
w 1 Ax 1 +w 2 Ax 2 −bA
so that the output of neuronAisy 1 =f(w 1 Ax 1 +w 2 Ax 2 −bA). Similarly, the
output of neuronBisy 2 =f(w 1 Bx 1 +w 2 Bx 2 −bB). Hence, the net input to
neuronC, from neuronsAandBwith the condition it must satisfy, is
wACf(w 1 Ax 1 +w 2 Ax 2 −bA)+wBCf(w 1 Bx 1 +w 2 Bx 2 −bB)−bC< 0to result in the outputy=0.
The lesson is this. More layers seem to increase the computation power
of neural networks. In other words, for general problems, single-layer neuron
networks are not enough; multi-layer neural networks should be considered.
5.3 Learning capability..........................
Artificial neural networks are simplified mathematical models of brain-like sys-
tems that function as parallel, distributed computing networks. As stated ear-
lier, neural networks need to be taught or trained from examples before they
can be put in use.
As we will see, the learning problem is nothing more than choosing a function
from a given class of functions according to some given criteria. First, we
need to know what functions can be implemented or represented by a neural
network. Only after we know the answer to that, can we search for ways to
design appropriate neural networks.
If we address general relationships expressed as functions fromRntoRm,
then neural networks with smooth activation functions (rather than the step
functions used in the early development of perceptrons for binary outputs) can
approximate continuous functions with compact support. This universal ap-
proximation property of neural networks was proved using the Stone-Weierstrass
theorem in the theory of approximation of functions (see Chapter 3). Specif-
ically, all continuous functions whose domains are closed and bounded inRn,
that is, having compact support, can be approximated to any degree of accuracy
by a neural network of one hidden layer with sigmoid or hyperbolic tangent ac-
tivation functions. This theoretical result means this: It is possible to design an
appropriate neural network to represent any continuous function having com-
pact support. This is only an existence theorem, not a constructive one. The
significance of the theorem is that it is reassuring, since most functions of prac-
tical interest are continuous.
