Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
mation the learning algorithm will be given is the width, height,and number of
sidesof each block. The training data is shown in Table 3.2.
A propositional rule set that might be produced for this data is:

if width ≥ 3.5 and height <7.0 then lying
if height ≥ 3.5 then standing
In case you’re wondering, 3.5 is chosen as the breakpoint for widthbecause it is
halfway between the width of the thinnest lying block, namely 4, and the width
of the fattest standing block whose height is less than 7, namely 3. Also, 7.0 is
chosen as the breakpoint for heightbecause it is halfway between the height of
the tallest lying block, namely 6, and the shortest standing block whose width
is greater than 3.5, namely 8. It is common to place numeric thresholds halfway
between the values that delimit the boundaries of a concept.
Although these two rules work well on the examples given, they are not very
good. Many new blocks would not be classified by either rule (e.g., one with
width 1 and height 2), and it is easy to devise many legitimate blocks that the
rules would not fit.
A person classifying the eight blocks would probably notice that
“standing blocks are those that are taller than they are wide.” This rule does
not compare attribute values with constants, it compares attributes with each
other:

if width > height then lying
if height > width then standing
The actual values of the heightand widthattributes are not important; just the
result of comparing the two. Rules of this form are called relational,because
they express relationships between attributes, rather than propositional,which
denotes a fact about just one attribute.

74 CHAPTER 3| OUTPUT: KNOWLEDGE REPRESENTATION


Table 3.2 Training data for the shapes problem.

Width Height Sides Class

2 4 4 standing
3 6 4 standing
4 3 4 lying
7 8 3 standing
7 6 3 lying
2 9 4 standing
9 1 4 lying
10 2 3 lying
Free download pdf