When doing analytic processing against the repetitive big data environment, the types of
processing can be classified in one of two ways. There is what can be termed
“distillation” processing, and there is what can be termed “filtering” processing.
Both of these processes can be done depending on the needs of the analyst.
In distillation processing, the results of the processing are a single set of results, such as
the creation of a profile. In retail operations, the desire might be to create a normal
profile. In banking, the result of distillation might be to create the new lending rate. In
manufacturing, the result might be to determine the best materials for manufacture.
In any case, the results of the distillation process are a single occurrence of a set of
values.
In filtering, the results are quite different. In filtering, the result of processing is the
selection of and the refinement of multiple records. In filtering, the objective is to find all
records that satisfy some criteria. Once those records have been found, the records can
then be edited, manipulated, or otherwise altered to suit the needs of the analyst. Then,
the records are output for further processing or analysis.
In a retail environment, the results of filtering might be the selection of all high-value
customers. In manufacturing, the results of filtering might be the selection of all end
products that failed quality tests. In health care, the results of filtering might be all
patients afflicted with a certain condition and so forth.
The processing that occurs in distillation and in filtering is quite different. The emphasis
in distillation is on analytic and algorithmic processing, and the emphasis in filtering is on
the selection of records and the editing of those records.
Fig. 9.1.10 illustrates the types of processing that can be done against repetitive data.
Chapter 9.1: Repetitive Analytics: Some Basics