data-architecture-a

(coco) #1

A third case for the staging area is the case where the data coming from the operational
environment must pass through a preprocessing step. In the preprocessing step, data pass
through edit and correction process.


One of the issues with the staging area is whether or not analytic processing can be done
against data found in the staging area. As a rule, data in the staging area are not used for
analytic processing. This is because the data found in the staging area have not yet been
passed through the transformation process. Therefore, it does not make sense to do any
analytic processing against data found in the staging area.


Note that a staging area is optional and most organizations do not need one.


Changed Data Capture


Yet, another variation on the classical interface between operational systems and data
warehouse systems is that of what is termed the CDC option. “CDC” stands for “changed
data capture.” For high-performance online transaction environments, it is difficult or
inefficient to scan the entire database every time data need to be refreshed into the data
warehouse environment. In these environments, it makes sense to determine what data
need to be updated into the data warehouse by examining the log tape or journal tape.
The log tape is created for the purposes of online backup and recovery in the eventuality
of a failure during online transaction processing. But the log tape contains all the data
that need to be updated into the data warehouse. The log tape is read offline and is used
to gather the data that need to be updated into the data warehouse.


Fig. 8.3.5 depicts the CDC option.


Chapter 8.3: The Data Warehouse/Operational Environment Interface
Free download pdf