Data Mining: Practical Machine Learning Tools and Techniques, Second Edition

(Brent) #1

10.2 EXPLORING THE EXPLORER 387


make sense for numeric prediction). Section 5.8 (Table 5.8) explains the
meaning of the various measures.
Ordinary linear regression (Section 4.6), another scheme for numeric pre-
diction, is found under LinearRegressionin the functionssection of the menu in
Figure 10.4(a). It builds a single linear regression model rather than the two in
Figure 10.11; not surprisingly, its performance is slightly worse.
To get a feel for their relative performance, let’s visualize the errors these
schemes make, as we did for the Iris dataset in Figure 10.6(b). Right-click the
entry in the history list and select Visualize classifier errorsto bring up the two-
dimensional plot of the data in Figure 10.12. The points are color coded by
class—but in this case the color varies continuously because the class is numeric.
In Figure 10.12 the Vendorattribute has been selected for the X-axis and the
instance number has been chosen for the Y-axis because this gives a good spread
of points. Each data point is marked by a cross whose size indicates the absolute
value of the error for that instance. The smaller crosses in Figure 10.12(a) (for
M5¢), when compared with those in Figure 10.12(b) (for linear regression),
show that M5¢is superior.

+ 0.0162 * MMIN
+ 0.0086 * MMAX
+ 0.8332 * CACH


  • 1.2665 * CHMIN



  • 1.2741 * CHMAX



  • 107.243


Number of Rules : 2

Time taken to build model: 1.37 seconds

=== Cross-validation ===
=== Summary ===

Correlation coefficient 0.9766
Mean absolute error 13.6917
Root mean squared error 35.3003
Relative absolute error 15.6194 %
Root relative squared error 22.8092 %
Total Number of Instances 209

+ 0.012 * MYCT

Figure 10.11(continued)

Free download pdf