Fig. 9.1.10 Distillation and filtering.
Subsetting Data
One of the results of filtering is the creation of subsets of data. As repetitive data are read
and filtered, the result is the creation of data into different subsets. There are lots of
practical reasons of subsetting data. Some of those reasons are the following:
- The reduction in volume of data that have to be analyzed. It is much easier to analyze and manipulate a
small subset of data than it is to analyze that same data mixed in with many other nonrelevant
occurrences of data. - Purity of processing. By subsetting data, the analyst can filter out unwanted data, so that the analysis
can focus on the data that are of interest. Creating a subset of data means that the analytic algorithmic
processing that occurs can be very focused on the objective of the analysis. - Security. Once data are selected into a subset, it can be protected with even higher levels of security
than when the data existed in an unfiltered state.
Subsetting data for analysis is a technique that is used commonly and has been used as
long as there were data and a computer.
One of the uses of subsetting of data is to set the stage for sampling.
In data sampling, processing goes against a sample of data rather than against the full set
of data. In doing so, the resources used for creating the analysis are considerably less, and
the time that it takes to create the analysis is significantly reduced. And in heuristic
Chapter 9.1: Repetitive Analytics: Some Basics