5.3. LEARNING CAPABILITY 173
Then comes the basic question. How do wefind an appropriate neural net-
work to represent a given relationship? Obviously, we need some information
from a given relationship in order to answer the question above. In the context
of learning, the information is provided in the form of atraining setconsisting
of known input-output pairs. Learning from information of this type is referred
to assupervised learning, or learning with a teacher.
Specifically, the learning problem is this. Suppose we have a setTof training
examples
T = {(xq,yq),q=1,...,N}
xq =(xq 1 ,xq 2 ,...,xqn)∈Rn
yq =(y 1 q,y 2 q,...,ymq)∈Rm
and we wish to use this data to adjust the weights and biases of a neural network
withninput nodes, andmoutput neurons with one hidden layer, for example.
Then from a common-sense viewpoint, we should compare the output
oq=(oq 1 ,oq 2 ,...,oqm)∈Rm
from the input patternxq, with the corresponding target outputyq. This is the
same as approximating a function from which the input-output pairs have been
drawn.
The error correction idea is simple: A change in weights should be made or
not according to the comparison of the actual output with the desired output.
This idea is formulated in terms of a suitable overall performance measure. In
a general sense, the learning problem may be stated as follows: Given a class of
functions (here an architecture of neural networks) and a performance criterion,
find a function in this class that optimizes the performance criterion on the basis
of a set of training examples.
In a sense, neural networks are a classof learning machines, and the back-
propagation algorithm is a special case of a general inductive principle called
the empirical risk minimization principle [76]. As such, the learning capability
of neural networks falls under the same line of analysis as learning machines.
We indicate here the ideas used in the analysis.
Recall that a control law is a functionφfrom an input spaceX to an
output spaceYthat predicts the output for a given input. If we do not have a
mathematical model, or even if we do but do not have an analytic method to
findφfrom it, we need to look for other alternatives.
Consider the situation where the information that will help usfindφis a
set of desirable pairs(xi,yi),i=1,...,m,whereyi=φ(xi). Then the obvious
problem is tofind a functionφN thatfits through these points in such a way
that prediction is good ñ that is, for newxthefitted functionφNwill produce
valuesφN(x)close enough toφ(x).
To carry out the program above, we need the specification of performance
criteria and a way to constructφNthat meets these criteria. The performance
criterion for a good approximation is specified as an acceptable error of ap-
proximation. A neural network is a tool that can be used to achieve these two