tron, the input layer has an additional constant input called the bias.However,
the third unit does not have any connections to the input layer. Its input con-
sists of the output of units A and B (either 0 or 1) and another constant bias
unit. These three units make up the hidden layerof the multilayer perceptron.
They are called “hidden” because the units have no direct connection to the envi-
ronment. This layer is what enables the system to represent XOR. You can verify
this by trying all four possible combinations of input signals. For example,
if attribute a 1 has value 1 and a 2 has value 1, then unit A will output 1 (because
1 ¥ 1 + 1 ¥ 1 - 0.5 ¥ 1 >0), unit B will output 0 (because - 1 ¥ 1 +- 1 ¥ 1 +1.5
¥ 1 <0), and unit C will output 0 (because 1 ¥ 1 + 1 ¥ 0 +-1.5 ¥ 1 <0).
This is the correct answer. Closer inspection of the behavior of the three units
reveals that the first one represents OR, the second represents NAND (NOT
combined with AND), and the third represents AND. Together they represent
the expression (a 1 OR a 2 ) AND (a 1 NAND a 3 ), which is precisely the definition
of XOR.
As this example illustrates, any expression from propositional calculus can be
converted into a multilayer perceptron, because the three connectives AND, OR,
and NOT are sufficient for this and we have seen how each can be represented
using a perceptron. Individual units can be connected together to form arbi-
trarily complex expressions. Hence, a multilayer perceptron has the same
expressive power as, say, a decision tree. In fact, it turns out that a two-layer per-
ceptron (not counting the input layer) is sufficient. In this case, each unit in the
hidden layer corresponds to a variant of AND—a variant because we assume
that it may negate some of the inputs before forming the conjunction—joined
by an OR that is represented by a single unit in the output layer. In other words,
each node in the hidden layer has the same role as a leaf in a decision tree or a
single rule in a set of decision rules.
The big question is how to learn a multilayer perceptron. There are two
aspects to the problem: learning the structure of the network and learning the
connection weights. It turns out that there is a relatively simple algorithm for
determining the weights given a fixed network structure. This algorithm is called
backpropagationand is described in the next section. However, although there
are many algorithms that attempt to identify network structure, this aspect of
the problem is commonly solved through experimentation—perhaps combined
with a healthy dose of expert knowledge. Sometimes the network can be
separated into distinct modules that represent identifiable subtasks (e.g., recog-
nizing different components of an object in an image recognition problem),
which opens up a way of incorporating domain knowledge into the learning
process. Often a single hidden layer is all that is necessary, and an appropriate
number of units for that layer is determined by maximizing the estimated
accuracy.
226 CHAPTER 6| IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMES