Chapter 9.3
Repetitive Analysis
Abstract
There are many facets to the analysis of repetitive data. One type of data where
repetitive data are found is in an open-ended continuous system. Another place where
repetitive analytics is done is in a project-based environment. A common practice for
analytics in repetitive analytics is that of looking for patterns. One issue that always
occurs with repetitive pattern analysis is the occurrence of false positives. A useful
approach for doing repetitive analytics is to create what is known as the “sandbox.”
Analysis in the sandbox does not go outside of the corporation. On the other hand, the
analyst is not constrained with regard to the analysis that is done or what data can be
analyzed. Log tapes often provide a basis for repetitive data analytics.
Keywords
Repetitive data; Open-ended continuous system; Project-based system; Pattern analysis;
Outliers; False positives; The “sandbox”; Log tapes
Internal, External Data
Because the cost of storage is so inexpensive with big data, it is possible to consider
storing data that come from other than internal sources.
In an earlier day, the cost of storage was such that the only data that corporations
considered to store were internally generated data. But with the cost of storage
diminished by the advent of big data, it is now possible to consider storing external data
and internal data.
One of the issues with storing external data is that of finding and using identifiers. But
textual disambiguation can be used on external data just as it can against internal data, so
it is entirely possible to establish discrete identifiers for external data.
Fig. 9.3.1 shows that storing external data in big data is a real possibility.
Chapter 9.3: Repetitive Analysis