data-architecture-a

(coco) #1

It has been conjectured as to how much data in the corporation are structured and how
much are unstructured. There are estimates as low as 2% and as high as 20%. The
estimate really depends on the nature of the business of the corporation and the nature of
what data are used in the calculation of the equation.


Repetitive/Nonrepetitive Unstructured Data


There are two basic kinds of unstructured data in the corporation—repetitive
unstructured data and nonrepetitive unstructured data.


Fig. 1.1.3 depicts the different kinds of unstructured data in the corporation.


Fig. 1.1.3 Repetitive data and nonrepetitive data.

A typical form of repetitive unstructured data in the corporation might be the data
generated by an analog machine. For example, a farmer has a machine that reads the
identification of railroad cars as the railroad cars pass through the farmer's property.
Trains pass through the property night and day. The electronic eye reads and records the
passage of each car on the track.


Nonrepetitive unstructured data are data that are nonrepetitive, such as e-mails. Each e-
mail can be long or short. The e-mail can be in English or Spanish (or some other
languages.) The author of the e-mail can say anything that he/she pleases. It is only a
pure accident if the contents of any e-mail are identical to the contents of any other e-
mail. And there are many forms of nonrepetitive unstructured data. There are voice
recordings, there are contracts, there are customer feedback messages, etc.


Because of its irregular form, unstructured data do not fit well with standard database
management systems.


Chapter 1.1: An Introduction to Data Architecture
Free download pdf