Data Mining: Practical Machine Learning Tools and Techniques, Second Edition
are working with binary data. Weights are unchanged if the attribute value is 0, because then they do not participate in the dec ...
4.7 INSTANCE-BASED LEARNING 129 When comparing distances it is not necessary to perform the square root oper- ation; the sums of ...
and select the smallest. This procedure is linear in the number of training instances: in other words, the time it takes to make ...
4.7 INSTANCE-BASED LEARNING 131 How do you build a kD-tree from a dataset? Can it be updated efficiently as new training example ...
In a typical case, this algorithm is far faster than examining all points to find the nearest neighbor. The work involved in fin ...
4.7 INSTANCE-BASED LEARNING 133 Figure 4.13 were any bigger, which it would be if the black instance were a little further from ...
134 CHAPTER 4| ALGORITHMS: THE BASIC METHODS (a) 16 610 (^4264) 22 4 2 2 2 (b)^22 Figure 4.14Ball tree for 16 training instances ...
4.7 INSTANCE-BASED LEARNING 135 Choose the point in the ball that is farthest from its center, and then a second point that is f ...
The nearest-neighbor method originated many decades ago, and statisticians analyzed k-nearest-neighbor schemes in the early 1950 ...
As we saw in Section 3.9, there are different ways in which the result of clus- tering can be expressed. The groups that are ide ...
minimum people often run the algorithm several times with different initial choices and choose the best final result—the one wit ...
can be updated immediately. If not, look inside the node by proceeding recur- sively down the tree. Figure 4.16 shows the same i ...
(a) 16 610 (^4264) 22 4 2 2 2 22 A B C (b) Figure 4.16A ball tree: (a) two cluster centers and their dividing line and (b) the c ...
Bayes was an eighteenth-century English philosopher who set out his theory of probability in “An essay towards solving a problem ...
a linear threshold unitas a binary test of whether a linear function is greater or less than zero and a linear machineas a set o ...
Evaluation is the key to making real progress in data mining. There are lots of ways of inferring structure from data: we have e ...
and marked manually—a skilled and labor-intensive process—before being used as training data. Even in the credit card applicatio ...
of each instance in the training set, which after all is why we can use it for train- ing. We are not generally interested in le ...
this data may be used to determine an estimate of the future error rate. In such situations people often talk about three datase ...
rather than error rate, so this corresponds to a success rate of 75%. Now, this is only an estimate. What can you say about the ...
«
4
5
6
7
8
9
10
11
12
13
»
Free download pdf