data-architecture-a

(coco) #1

The Great Divide of Data


It is not obvious at all, but the dividing line in unstructured data between unstructured
repetitive data and unstructured nonrepetitive data is very significant. In fact, the dividing
line between unstructured repetitive data and unstructured nonrepetitive data is so
important that the division can be called the “great divide” of data.


Fig. 1.1.4 shows the great divide of data.


Fig. 1.1.4 The great divide.

It is hardly obvious why there should be this great divide of data. But there are some very
good reasons for the divide:


Repetitive data usually have very limited business value, while nonrepetitive data are rich in business
value.
Repetitive data can be handled one way; nonrepetitive data are handled very differently.
Repetitive data can be analyzed one way, while nonrepetitive data can be analyzed in a very different
manner.
And so forth.

The two worlds—of repetitive data and of nonrepetitive data—are as different as chalk
and cheese. Tools and techniques that work in one world simply are not applicable to the
other world and vice versa.


In many ways, the great divide of data is as profound as the continental divide. In the
continental divide, snow that falls on one side of the divide ends up as water that flows to
the Pacific Ocean, whereas snow that falls on the other side of the divide ends up heading


Chapter 1.1: An Introduction to Data Architecture
Free download pdf