data-architecture-a

(coco) #1

there are cathode ray tubes (CRTs) emanating from an application, the diagram is
representative of online transaction processing systems. In reality, there are MANY
applications and MANY databases represented by the application, database, and CRTs.


The diagram shows that there are two major types of big data—repetitive data and
nonrepetitive data. And of the repetitive data, there is simple repetitive data and context-
enriched repetitive data.


The typical sources of the different types of big data are shown as well.


The diagram shows that repetitive data are distilled into data that can be placed into the
analytic data warehouse environment. In addition, nonrepetitive data can be
disambiguated and placed either in the data warehouse or back into big data as context-
enriched repetitive big data.


Redundancy


There are many issues raised by the diagram. One of the issues is that of redundant data.
One looks at the diagram, and it appears that there is redundant data everywhere.


In fact, there is data that have been transformed. And if a value of data remains the same
after transformation, then you may want to consider the data to be redundant. Then
again, you may not.


Consider redundancy in the real world. Take the time of day. You can find the time of
day on the Internet, on the telephone, on the radio, on television, and many other places,
for that matter. Does the fact that time of day appears redundantly in many places
becomes a bother? The only time it becomes a bother is if there is no way to determine
what the accurate time is. If there were no definitive source of time, then having time
appear redundantly would be a problem. But as long as there is some definitive source
somewhere and as long as most redundant sources adhere to that definitive source, then
there is no problem. In fact, having redundant sources of time is actually quite helpful, as
long as there is no problem with the integrity of that time.


Therefore, having redundant data across the enterprise as seen in Fig. 8.4.1 is not an issue
as long as the integrity of the data is not an issue.


The System of Record


Chapter 8.4: Data Architecture: A High-Level Perspective
Free download pdf