Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

tron, the input layer has an additional constant input called the bias.However, the third unit does not have any connections to the input layer. Its input con- sists of the output of units A and B (either 0 or 1) and another constant bias unit. These three units make up the hidden layerof the multilayer perceptron. They are called “hidden” because the units have no direct connection to the envi- ronment. This layer is what enables the system to represent XOR. You can verify this by trying all four possible combinations of input signals. For example, if attribute a 1 has value 1 and a 2 has value 1, then unit A will output 1 (because 1 ¥ 1 + 1 ¥ 1 - 0.5 ¥ 1 >0), unit B will output 0 (because - 1 ¥ 1 +- 1 ¥ 1 +1.5 ¥ 1 <0), and unit C will output 0 (because 1 ¥ 1 + 1 ¥ 0 +-1.5 ¥ 1 <0). This is the correct answer. Closer inspection of the behavior of the three units reveals that the first one represents OR, the second represents NAND (NOT combined with AND), and the third represents AND. Together they represent the expression (a 1 OR a 2 ) AND (a 1 NAND a 3 ), which is precisely the definition of XOR. As this example illustrates, any expression from propositional calculus can be converted into a multilayer perceptron, because the three connectives AND, OR, and NOT are sufficient for this and we have seen how each can be represented using a perceptron. Individual units can be connected together to form arbi- trarily complex expressions. Hence, a multilayer perceptron has the same expressive power as, say, a decision tree. In fact, it turns out that a two-layer perceptron (not counting the input layer) is sufficient. In this case, each unit in the hidden layer corresponds to a variant of AND—a variant because we assume that it may negate some of the inputs before forming the conjunction—joined by an OR that is represented by a single unit in the output layer. In other words, each node in the hidden layer has the same role as a leaf in a decision tree or a single rule in a set of decision rules. The big question is how to learn a multilayer perceptron. There are two aspects to the problem: learning the structure of the network and learning the connection weights. It turns out that there is a relatively simple algorithm for determining the weights given a fixed network structure. This algorithm is called backpropagationand is described in the next section. However, although there are many algorithms that attempt to identify network structure, this aspect of the problem is commonly solved through experimentation—perhaps combined with a healthy dose of expert knowledge. Sometimes the network can be separated into distinct modules that represent identifiable subtasks (e.g., recog- nizing different components of an object in an image recognition problem), which opens up a way of incorporating domain knowledge into the learning process. Often a single hidden layer is all that is necessary, and an appropriate number of units for that layer is determined by maximizing the estimated accuracy.

226 CHAPTER 6| IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMES

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

Get our desktop app

Company

Features

Documentation

Resources