1152
Part IX: Business Intelligence
■ (^) Type 1: Similar to type 0, history is not tracked. Therefore, if a change occurs to
the source data, the column specifi ed as Type 1 would be updated. For example,
assume that your data warehouse contains a customer dimension that includes Last
Name as one of the columns. If there were a case when a Last Name changed, like
when a woman gets married, then the Last Name for that row would be updated.
■ Type 2: Unlike the fi rst two types, Type 2 does track history. Using that approach,
every change that occurs in the source data for a particular column is tracked
in the dimension table. Therefore, if an existing row changes, a new row will be
inserted into the dimension table that refl ects that value of the changed data. To
accommodate this you need to modify the structure of the table so that you can
identify the row that is active at the time of the fact load. In some cases a simple
bit fl ag is added to the table. However, if you want a more robust solution, one that
allows you to identify when a row is active, you need to add two columns to your
table. One column, start date, defi nes when the row was modifi ed; and the other
column, end date, specifi es when another change occurs. As the fact is loaded, the
ETL process can either use the bit fl ag to identify which row to use, or it can use
the start and end dates.
Accommodating the varying uses of these types presents certain challenges when loading
dimensions and facts, unless every column in the dimension must be specifi ed as Type 0.
On the other hand, if one column is designated as Type 1 or Type 2, then the load process
becomes complicated. The primary challenge with Type 2 attributes is creating a process
that can detect changes to a given attribute. You can accomplish this with T-SQL by using
the CHECKSUM or HASHBYTES functions. Although they both offer a certain level of confi -
dence, neither guarantees 100 percent change detection.
With a Type 2 dimension, you must not detect only changes, but you must also incorporate
a process that expires the old row and inserts the new row. The process must fi rst detect
if the row currently exists in the dimension. If it does not, then the process must simply
insert a new row. Depending on whether you include the bit fl ag, the start and end date,
or both, you must include them during the insertion. If you included only the bit fl ag, it
would be set to on when a new row is added. When you include the start and end date, the
start date is when the row is fi rst inserted, and the end date is some date in the far future.
Figure 51-5 illustrates the use of both methods. If it does exist, then the existing row must
be expired, by turning the bit fl ag off and updating the end date, which is shown in Figure
51-6. Finally, a new row will be inserted refl ecting the changed data, with the active fl ag
on and setting the start and end dates.
FIGURE 51-5
When a new row is inserted into a type 2 dimension, the bit fl ag is set to on, and the start and
end date are set.
c51.indd 1152c51.indd 1152 7/31/2012 10:28:43 AM7/31/2012 10:28:43 AM
http://www.it-ebooks.info