data-architecture-a

(coco) #1

longer suffers reengineering because of changing requirements.


Why Does Reengineering Happen Because of Big Data?


Reengineering/redesign/rearchitecture happens because big data pushes three of the four
available axes in the following diagram. The more processing that has to happen in
smaller and smaller time frames requires a highly optimized design. The more variety
needing to be processed in smaller time frames also requires a highly optimized design.
Finally, the more volume needing to be processed in smaller time frames (you guessed it)
requires a highly optimized design.


Fortunately for the community, there is a finite set of process designs that have been
proved to work at scale, and by leveraging MPP, scale-free mathematics, and set logic,
these designs work both for small volumes and extremely large volumes without redesign.


Fig. 6.5.1 contains four axis labels: velocity, volume, time, and variety. In this figure,
velocity is the speed of arrival of the data (i.e., latency of arrival); volume is the overall
size of the data (on arrival to the warehouse); variety is defined to be the structural,
semistructural, multistructural, or nonstructured classification of the data; and time is the
allotted time frame in which to accomplish the given task (e.g., loading to the data
warehouse). Let's examine a case study for how this impacts reengineering or even
conditional architecture.


Scenario #1: Ten rows of data arrive every 24 hours, highly structured (tab delimited and
fixed number of columns). The requirement is to load the data to the data warehouse
within a 6-hour window. The question is as follows: how many different architectures or
process designs can be put together in order to accomplish this task? For sake of
argument, let's state that there are 100 possibilities (even typing the data in by hand or
typing it in to Excel and then loading it to the database).


Chapter 6.5: Introduction to Data Vault Implementation
Free download pdf