Microsoft® SQL Server® 2012 Bible

(Ben Green) #1

1291


Chapter 57: Data Mining with Analysis Services


57


The Decision Tree Viewer, as shown in Figure 57-3, graphically displays the resulting tree
for potential bike buyers. The Mining Legend pane displays the details of any selected
node, including how the cases break out by the predictable variable. The Dependency
Network Viewer is also available for decision trees, displaying both input and predict-
able columns as nodes with arrows indicating what predicts what. Move the slider to
the bottom to see only the most signifi cant predictions. Click a node to highlight its
relationships.

Linear Regression
The linear regression algorithm is implemented as a variant of decision trees and is a good
choice for continuous data that relates more or less linearly. The result of the regression is
an equation in the following form

Y = B 0 + A 1 *(X 1 +B 1 ) + A 2 *(X 2 +B 2 ) +...

where Y is the column predicted, Xi are the input columns, and Ai/Bi are constants deter-
mined by the regression. Because this algorithm is a special case of decision trees, it shares
the same mining viewers. You can use the equation either directly or queried in the mining
model via the Predict function.

Clustering
The clustering algorithm functions by gathering similar cases together into groups called
clusters and then iteratively refi ning the cluster defi nition until no further improvement
can be gained. This approach is good for profi ling populations. Several viewers display data
from the fi nished model:

■ Cluster Diagram: This viewer displays each cluster as a shaded node with connect-
ing lines between similar clusters — the darker the line, the more similar the clus-
ter. You can use a slider at the bottom to show more similar clusters.

■ (^) Cluster Profiles: Unlike node shading in the Cluster Diagram Viewer, where you
can examine one variable value at a time, the Cluster Profi les Viewer shows all
variables and clusters in a single matrix. Each cell of the matrix is a graphical
representation of that variable’s distribution in the given cluster, as shown in
Figure 57-4. Discrete variables are shown as stacked bars and continuous variables
as diamond charts centered on the mean. The taller the diamond, the less uniform
the values.
■ Cluster Characteristics: This view displays the list of characteristics that make up
a cluster and the probability that each characteristic appears.
■ Cluster Discrimination: Similar to the Characteristics Viewer, this shows which
characteristics favor one cluster versus another.
c57.indd 1291c57.indd 1291 7/31/2012 10:35:03 AM7/31/2012 10:35:03 AM
http://www.it-ebooks.info

Free download pdf