Microsoft® SQL Server® 2012 Bible

1291

Chapter 57: Data Mining with Analysis Services

57

The Decision Tree Viewer, as shown in Figure 57-3, graphically displays the resulting tree for potential bike buyers. The Mining Legend pane displays the details of any selected node, including how the cases break out by the predictable variable. The Dependency Network Viewer is also available for decision trees, displaying both input and predictable columns as nodes with arrows indicating what predicts what. Move the slider to the bottom to see only the most signifi cant predictions. Click a node to highlight its relationships.

Linear Regression The linear regression algorithm is implemented as a variant of decision trees and is a good choice for continuous data that relates more or less linearly. The result of the regression is an equation in the following form

Y = B 0 + A 1 *(X 1 +B 1 ) + A 2 *(X 2 +B 2 ) +...

where Y is the column predicted, Xi are the input columns, and Ai/Bi are constants deter- mined by the regression. Because this algorithm is a special case of decision trees, it shares the same mining viewers. You can use the equation either directly or queried in the mining model via the Predict function.

Clustering The clustering algorithm functions by gathering similar cases together into groups called clusters and then iteratively refi ning the cluster defi nition until no further improvement can be gained. This approach is good for profi ling populations. Several viewers display data from the fi nished model:

■ Cluster Diagram: This viewer displays each cluster as a shaded node with connect- ing lines between similar clusters — the darker the line, the more similar the cluster. You can use a slider at the bottom to show more similar clusters.

■ (^) Cluster Profiles: Unlike node shading in the Cluster Diagram Viewer, where you
can examine one variable value at a time, the Cluster Profi les Viewer shows all
variables and clusters in a single matrix. Each cell of the matrix is a graphical
representation of that variable’s distribution in the given cluster, as shown in
Figure 57-4. Discrete variables are shown as stacked bars and continuous variables
as diamond charts centered on the mean. The taller the diamond, the less uniform
the values.
■ Cluster Characteristics: This view displays the list of characteristics that make up
a cluster and the probability that each characteristic appears.
■ Cluster Discrimination: Similar to the Characteristics Viewer, this shows which
characteristics favor one cluster versus another.
c57.indd 1291c57.indd 1291 7/31/2012 10:35:03 AM7/31/2012 10:35:03 AM
http://www.it-ebooks.info

Microsoft® SQL Server® 2012 Bible

Get our desktop app

Company

Features

Documentation

Resources