Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
examples that satisfy it divided by the number that satisfy its condition but not
its conclusion. For example, the condition in the top center box applies to 52 of
the examples, and 49 of them are Iris versicolor.The strength of this represen-
tation is that you can get a very good feeling for the effect of the rules from the
boxes toward the left-hand side; the boxes at the right cover just a few excep-
tional cases.
To create these rules, the default is first set to Iris setosaby taking the most
frequently occurring class in the dataset. This is an arbitrary choice because for
this dataset all classes occur exactly 50 times; as shown in Figure 6.7 this default
“rule” is correct in 50 of 150 cases. Then the best rule that predicts another class
is sought. In this case it is

if petal length ≥ 2.45 and petal length <5.355
and petal width <1.75 then Iris versicolor

This rule covers 52 instances, of which 49 are Iris versicolor.It divides the dataset
into two subsets: the 52 instances that do satisfy the condition of the rule and
the remaining 98 that do not.
We work on the former subset first. The default class for these instances is
Iris versicolor:there are only three exceptions, all of which happen to be Iris
virginica.The best rule for this subset that does not predict Iris versicoloris
identified next:

if petal length ≥ 4.95 and petal width <1.55 then Iris virginica

It covers two of the three Iris virginicasand nothing else. Again it divides the
subset into two: those instances that satisfy its condition and those that do
not. Fortunately, in this case, all instances that satisfy the condition do
indeed have the class Iris virginica,so there is no need for a further exception.
However, the remaining instances still include the third Iris virginica,along with
49 Iris versicolors,which are the default at this point. Again the best rule is
sought:

if sepal length < 4.95 and sepal width ≥2.45 then Iris virginica

This rule covers the remaining Iris virginicaand nothing else, so it also has no
exceptions. Furthermore, all remaining instances in the subset that do not satisfy
its condition have the class Iris versicolor,which is the default, so no more needs
to be done.
Return now to the second subset created by the initial rule, the instances that
do not satisfy the condition

petal length ≥2.45 and petal length <5.355 and petal width <1.75

212 CHAPTER 6| IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMES

Free download pdf