data-architecture-a

(coco) #1

multiple environments. A “staging area” may be a file store on Amazon S3 or an Azure
Cloud, or it may be a Hadoop distributed file system (HDFS). It may also be a relational
database table structure. Staging areas focus the data in a single concept in preparation
for moving the data downstream.


What Are the Basic Rules of the Data Vault Model?


There are some fundamental rules in data vault modeling that must be followed, or the
model itself no longer qualifies to be a data vault model. These rules are documented in a
classroom environment in full. However, some of the rules are listed below:


1. (1) Business keys are separated by GRAIN and semantic meaning. That means customer corporation
and customer individual must exist or be recorded in two separate hub structures.
2. (2) Relationships, events, and intersections across two or more business keys are placed into link
structures.
3. (3) Link structures have no begin or end dates; they are merely an expression of the relationship at the
time the data arrived in the warehouse.
4. (4) Satellites are separated by the type of data/classification and rate of change. Type of data is
typically a single source system.

Raw data vault modeling does not allow nor provide for such concepts or notions as
conformity, nor does it deal with super types. Those concepts lie within the business vault
models (another form of data vault modeling that is used as an information delivery
layer).


Why Do We Need Many to Many Link Structures?


Many-to-many link structures allow the data vault model to be future proof/extendable.
The relationships expressed in source systems are often a reflection of business rules or
business execution today. The relationship definition has changed over time and will
continue to change. To represent both historical and future data (without reengineering
the model and the load routines), many-to-many relationship tables are necessary.


This is how the Data Vault 2.0 data warehouse can expose the patterns of relationship
changes over time answering questions like where is the gap between “current
requirements” and “relationships” in history? The many-to-many table (link) in the raw
data vault provides metrics around what percentage of data are “broken” and when that
data break the relationship requirement.


Chapter 6.2: Introduction to Data Vault Modeling
Free download pdf