very good reasons for managing metadata differently. In big data, it often makes sense to
store the descriptive metadata physically in the same location and same data set as the
data being described.
There are several very good reasons for the physical storage of metadata in the same
physical location as the data itself. Some of those reasons are the following:
Storage is cheap. There is no reason why the cost of storage needed to store the metadata should ever be
an issue.
The world of big data is undisciplined. Having the metadata stored directly with the data being
described means that the metadata will never be lost or misplaced.
Metadata change over time. When the metadata are stored directly with the data being described, there
is ALWAYS a direct relationship between the metadata and the data being described. In other words,
the metadata NEVER go out of sync with the data being described.
Simplicity of processing. When the analyst starts to process data in big data, there is never a search for
the metadata. It is always easy to locate because it is always with the data being described.
Fig. 9.2.13 shows that embedding metadata along with the data stored in big data is a
good idea.
Fig. 9.2.13 Embedded metadata is a good idea.
Note that storing metadata directly with the data stored in big data does not preclude the
possibility of having a repository of metadata for big data. There is nothing to say that
metadata cannot be stored in the data with big data AND reside in a repository as well.
Linking Data
Chapter 9.2: Analyzing Repetitive Data