Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1
of the concept that is to be learned. Of course, the instances are not really inde-
pendent—there are plenty of relationships among different rows of the table!—
but they are independent as far as the concept of sisterhood is concerned. Most
machine learning schemes will still have trouble dealing with this kind of data,
as we will see in Section 3.6, but at least the problem has been recast into the
right form. A simple rule for the sister-of relation is as follows:

If second person’s gender = female
and first person’s parent1 =second person’s parent1
then sister-of =yes
This example shows how you can take a relationship between different nodes
of a tree and recast it into a set of independent instances. In database terms, you
take two relations and join them together to make one, a process of flattening
that is technically called denormalization.It is always possible to do this with
any (finite) set of (finite) relations.
The structure of Table 2.4 can be used to describe any relationship between
two people—grandparenthood, second cousins twice removed, and so on. Rela-

2.2 WHAT’S IN AN EXAMPLE? 47


Table 2.3 Family tree represented as a table.

Name Gender Parent1 Parent2

Peter male??
Peggy female??
Steven male Peter Peggy
Graham male Peter Peggy
Pam female Peter Peggy
Ian male Grace Ray
...

Table 2.4 The sister-of relation represented in a table.

First person Second person

Name Gender Parent1 Parent2 Name Gender Parent1 Parent2 Sister of?


Steven male Peter Peggy Pam female Peter Peggy yes
Graham male Peter Peggy Pam female Peter Peggy yes
Ian male Grace Ray Pippa female Grace Ray yes
Brian male Grace Ray Pippa female Grace Ray yes
Anna female Pam Ian Nikki female Pam Ian yes
Nikki female Pam Ian Anna female Pam Ian yes


all the rest no
Free download pdf