1182
Part IX: Business Intelligence
Transform Defi nition/Use
Percentage Sampling Splits a data fl ow by randomly sampling the rows for a given percent-
age. For example, this could be used to separate a data set into train-
ing and testing sets for data mining.
Pivot Denormalizes a data fl ow, similar to the way an Excel pivot table oper-
ates, making attribute values into columns.
Row Count Counts the number of rows in a data fl ow and places the result into a
variable. Confi gure by populating the VariableName property.
Row Sampling Nearly identical to the Percentage Sampling transform, except that
the approximate number of rows to be sampled is entered, rather
than the percentage of rows.
Script Component Using a script as a transformation enables transformations with com-
plex logic to act on a data fl ow.
Slowly Changing
Dimension
Compares the data in a data fl ow to a dimension table, and, based on
the roles assigned to particular columns, maintains the dimension.
This component is unusual in that it does not have an editor; instead,
a wizard guides you through the steps to defi ne column roles and
interactions with the dimension table. At the conclusion of the wizard,
several components are placed on the design surface to accomplish
the dimension maintenance task.
Sort Sorts the rows in a data fl ow by selected columns. Confi gure by
selecting the columns to sort by. Then, in the lower pane, choose the
sort type, the sort order, and the comparison fl ags appropriate to the
data being sorted. This is a fully blocking transform; it can adversely
affect performance.
Term Extraction Builds a new data fl ow based on terms it fi nds in a Unicode text col-
umn (DT_WSTR or DT_NTEXT). This is the training part of text mining,
whereby strings of a particular type generate a list of commonly used
terms, which is later used by the Term Lookup component to identify
similar strings.
Term Lookup Provides a “join” between a Unicode text column (DT_WSTR or DT_
NTEXT) in the data fl ow and a reference table of terms built by the
Term Extraction component.
Union All Combines rows from multiple data fl ows into a single data fl ow,
assuming the source columns are of compatible types. Confi gure by
connecting as many data fl ows as needed to the component. Then,
using the editor, ensure that the correct columns from each data fl ow
are mapped to the appropriate output column.
Unpivot Makes a data fl ow more normalized by turning columns into attribute
values. For example, a data fl ow with one row for each quarter and a
column for revenue by region could be turned into a three-column
data fl ow: Quarter, Region, and Revenue.
TABLE 52- 4 (continued)
c52.indd 1182c52.indd 1182 7/31/2012 10:29:30 AM7/31/2012 10:29:30 AM
http://www.it-ebooks.info