data-architecture-a

(coco) #1

that a hub is defined as a unique list of business keys. The preference is to use natural
business keys that have meaning to the business.


One of the functions of a properly built raw Data Vault 2.0 model is to provide
traceability across the lines of business. To do this, the business keys must be stored in
the hub structures according to a set of design standards.


Most of the business keys in the source system today are surrogate sequence numbers
defined by the source application. The world is full of these “dumb” machine-generated
numeric values. Examples include customer number, account number, invoice number,
and order number, and the list goes on.


Source System Sequence Business Keys


Source system sequence-driven business keys make up 98% of the source data that any
data warehouse or analytic system receives. Even down to transaction ID, e-mail ID, or
some of the unstructured data sets, such as document ID, contain surrogates. The theory
is that these sequences should never change and should represent the same data once
established and assigned.


That said, the largest problem that exists in the operational systems is one the analytic
solution is always asked to solve, that is, how to integrate (or master) the data set, to
combine it across business processes and make sense of the data that have been assigned
multiple sequence business keys throughout the business life cycle.


An example of this may be customer account. Customer account in SAP may mean the
same thing as customer account in Oracle Financials or some other customer relationship
management (CRM) or enterprise resource planning (ERP) solution. Generally, when the
data are passed from SAP to Oracle Financials, typically, the receiving OLTP application
assigns a new “business key” or surrogate sequence ID. It's still the same customer
account; however, the same representative data set now has a new key.


The issue becomes as follows: how do you put the records back together again? This is a
master data management (MDM) question and with an MDM solution in place (including
good governance and good people) can be solved and approximated with deep learning
and neural networks. Even statistical analysis of “similar attributes” can detect within a
margin of error the multiple records that “should” be the same but contain different keys.


This business problem perpetuates into the data warehouse and analytic solution typically


Chapter 6.2: Introduction to Data Vault Modeling
Free download pdf