Microsoft® SQL Server® 2012 Bible

(Ben Green) #1

1282


Part IX: Business Intelligence


■ (^) Time Series or Forecasting: Predicts what a time series looks like in the future.
For example, what revenue do we expect in the upcoming quarter?
■ (^) Sequence Analysis: Determines what items tend to occur together in a specifi c
order. In what order are products normally purchased?
These categories are helpful when you think about what you can use data mining for, but
with increased comfort level and experience, many other applications are possible.


The Data Mining Process


A traditional practice in data mining is to train a data mining model using existing data
for which an outcome is already known and then use that model to predict the outcome of
new data. This requires several steps, only some of which happen within Analysis Services:

■ Business and data understanding: Understand the important questions and the
available data to answer those questions. Insights gained must be relevant to busi-
ness goals to be of use. Data must be of acceptable quality and relevance to obtain
reliable answers.
■ Prepare data: Preparing data can be a simple or diffi cult task depending on the
current state of the data. Some of the tasks to consider include the following:
■ Eliminate rows of low data quality. The measure of quality is domain-specifi c.
Eliminate values outside of expected norms, or failing any test that proves the
row describes an impossible or highly improbable case.

■ (^) Eliminate duplicates, invalid values, or inconsistent values.
■ Denormalize data by creating views to create a single “case” table.
■ (^) Erratic time series data may benefi t from smoothing to remove dramatic
variations.
■ (^) Derived attributes, such as profi t, can be useful in the modeling process.
■ Model: You build Analysis Services models by fi rst defi ning a data mining structure
that specifi es the tables to use as input. Then, add data mining models (different
algorithms) to the structure. Use the training data to simultaneously train all the
models within the structure.
■ Evaluate: You can simplify evaluating the accuracy and usefulness of the candidate
mining models by using the Analysis Services’ Mining Accuracy Chart. Use the test-
ing data set to understand the expected accuracy of each model and compare it to
business needs.
■ Deploy: Integrate prediction queries into applications to predict the outcomes of
interest.
The process may just iterate between prepare/model/evaluate cycles. At the other end of
the spectrum, an application may build, train, and query a model to accomplish a task,
c57.indd 1282c57.indd 1282 7/31/2012 10:35:01 AM7/31/2012 10:35:01 AM
http://www.it-ebooks.info

Free download pdf