Chapter 9.2
Analyzing Repetitive Data
Abstract
There are many facets to the analysis of repetitive data. One type of data where
repetitive data are found is in an open-ended continuous system. Another place where
repetitive analytics is done is in a project-based environment. A common practice for
analytics in repetitive analytics is that of looking for patterns. One issue that always
occurs with repetitive pattern analysis is the occurrence of false positives. A useful
approach for doing repetitive analytics is to create what is known as the “sandbox.”
Analysis in the sandbox does not go outside of the corporation. On the other hand, the
analyst is not constrained with regard to the analysis that is done or what data can be
analyzed. Log tapes often provide a basis for repetitive data analytics.
Keywords
Repetitive data; Open-ended continuous system; Project-based system; Pattern analysis;
Outliers; False positives; The “sandbox”; Log tapes
Much of the data found in big data are repetitive. Analyzing repetitive data in the big data
environment is quite different than analyzing data in the nonrepetitive environment. As a
point of departure, we need to look at what the repetitive big data environment looks like.
Fig. 9.2.1 shows that data in the repetitive big data environment look like lots of units of
data laid end to end.
Chapter 9.2: Analyzing Repetitive Data