192 CHAPTER 5. NEURAL NETWORKS FOR CONTROL
5.8 Practicalissuesintraining......................
There is no prescribed methodology that predetermines a neural network archi-
tecture for a given problem. Some trial and error is required to determine a
sufficiently suitable model. The following is a list of several aspects to keep in
mind when selecting an appropriate neural network structure.
- More than one hidden layer may be beneficial for some applications, but
in general, one hidden layer is sufficient. - The learning rate should be chosen in the open interval(0,1),sinceal-
though largeηmight result in a more rapid convergence, smallηavoids
overshooting the solution. - Training a neural network means creating a general model for an input-
output relationship from samples. This model can be applied to new data
sets of the problem, that is, can generalize to new data. - Overfitting means performing a poor generalization on new data. This
happens when the number of parameters (weights) is greater than the
number of constraints (training samples). - There are no general conclusions about how many neurons should be in-
cluded in the hidden layer. - The choice of initial weights will influence whether the neural network
reaches a global or local minimum of the errorE,andifso,howquickly
it converges (a property of the gradient descent method). The update of
the weightswikdepends on bothfk^0 of the upper layer and the output of
the neuroniin the lower layer. For this reason, it is important to avoid
choices of the initial weights that would make it likely that either of these
quantities is zero.
Initial weights must not be too large, or the initial input signals will be
likely to fall into the region where the derivative of the activation function
has a very small value (the saturation region). On the other hand, if the
initial weights are too small, the net input to a hidden neuron or output
neuron will be close to zero, which also causes extremely slow learning.
As a common procedure, the initial weights are chosen at random, either
between− 1 and 1 or in some other appropriate interval.
7.Howlongdoweneedtotrainaneuralnetwork? Onecoulddividea
training set into two disjoint subsets: I and II. Use I to train the neural
network and use II for testing. During the training, one could compute the
errors from II. If these errors decrease, then continue the training. If they
increase, then stop the training because the neural network is starting to
memorize the set I too specifically and consequently is losing its ability to
generalize.