data-architecture-a

(coco) #1
Fig. 10.1.23 Preprocessing and postprocessing.

Textual ETL is designed to do as much processing as possible within the scope of the
program. The reason why neither preprocessing nor postprocessing is a normal part of the
workflow is because of overhead. When you do either preprocessing or postprocessing,
the overhead of processing is elevated.


There are several activities that occur in preprocessing, if in fact it is necessary to run
preprocessing. Some of those activities include the following:


Filtering unwanted and unneeded data
Fuzzy logic repair of data
Classification of data
Raw editing of data

Fig. 10.1.24 shows the processing that occurs inside the preprocessor.


Fig. 10.1.24 Preprocessor.

Chapter 10.1: Nonrepetitive Data
Free download pdf