1151
Chapter 51: Business Intelligence Database Design
51
As with the dimensions tables, you should copy the source data for the fact tables into a
staging table. The source data for the facts should contain the measures that you want to
load into the fact table and the primary keys for the dimensions related to the facts. The
code to load the fact table looks like this:
INSERT INTO Warehouse.dbo.FactSales(CustomerSK, ProductSK, TotalDue)
SELECT
ISNULL(CustomerSK, -1),
ISNULL(ProductSK, -1),
TotalDue
FROM Staging.dbo.Sales s
LEFT OUTER JOIN Warehouse.dbo.DimProduct p
ON s.ProductID = p.ProductID
LEFT OUTER JOIN Warehouse.dbo.DimCustomer c
ON s.CustomerID = c.CustomerID
You may wonder about the use of ISNULL in the SELECT and why LEFT OUTER JOIN is
used. The purpose of LEFT OUTER JOIN is to identify any rows in the source data unre-
lated to any dimensions. Then in the SELECT if the value is NULL, it is set to –1. Why –1?
As mentioned in the “Ensuring Consistency within a Data Warehouse” section, you should
insert a row into each dimension in an attempt to establish and maintain consistency for
invalid or missing data. This is where that row becomes valid. Instead of allowing the iden-
tity value of the surrogate key to set the value, when inserting the row you should set the
value to a number that is consistent across all dimensions. In this case –1 was selected;
therefore it is specifi ed for missing rows.
Changing Data in Dimensions
In the “Loading Dimensions” section, a brief reference was given to loading historical data
into dimensions. This is a common scenario encountered when building the ETL process for
a data warehouse. When defi ning the requirements for your data warehouse, this is some-
thing that should defi nitely be addressed with your user population. The nomenclature that
is used when defi ning dimensions can be somewhat confusing. Dimensions are often cat-
egorized into types: Type 0, Type 1, and Type 2. This is not an exhaustive list. However, for
the sake of brevity, you can focus on the aforementioned list.
This typing of dimensions can be confusing because it is not the entire dimension that is
typed, but instead each column (attribute) has a type of its own. During the initial load,
a single row for each unique row is loaded into the dimension. The type of dimension can
determine what, if any, operation will be performed on that row. The following is a brief
description of the types:
■ Type 0: History is not tracked and changes to any column are completely ignored.
For example, you may have an Employee dimension that contains a Social Security
number. This is a value that usually does not change. In this case if a change did
occur you could defi ne this column as Type 0, and if a change occurs, the update
would be ignored in the ETL process.
c51.indd 1151c51.indd 1151 7/31/2012 10:28:42 AM7/31/2012 10:28:42 AM
http://www.it-ebooks.info