data-architecture-a

(coco) #1
Big data is
data that is stored in very large volumes,
data that is stored on inexpensive storage,
data that is managed by the “Roman census” method,
data that is stored and managed in an unstructured format.

These then are the defining characteristics of big data that will be used in this book.


Each of these characteristics deserves a more elucidating explanation.


Large Volumes


Most organizations already have an adequate amount of data to run day-to-day business.
But some organizations have an extraordinary amount of data. Some organizations have a
need to look at such things as the following:


All the data on the Internet
Meteorologic data sent down by a satellite
All of the e-mails in the world
Manufacturing data generated by an analog computer
Railroad cars as they traverse tracks
Many more applications

For these organizations, there is no good and inexpensive way to store and manage data.
Even if the data could be stored in a standard DBMS, the cost of storage would be
exorbitantly high. So for some organizations, there is a need to store and manage very
large amounts of data.


When facing the issue of managing very large amounts of data, there is the issue of
business value that arises. The fundamental question of “what business value is there in
being able to look at massive volumes of data?” needs to be addressed. The old saw of
“build it and they will come” does not apply to large amounts of data. Before the
organization sets out to store massive amounts of data, there needs to be a good
understanding of what business value of data lies in the data itself.


Inexpensive Storage


Even if big data were able to store and manage massive amounts of data, it would not be
practical to create huge stores if the storage medium that was used was expensive
storage. Stated another way, if big data stored data on only expensive high-performance


Chapter 4.2: What Is Big Data?
Free download pdf