The Processing Components of Textual ETL
From a processing standpoint, there are two major processing sections of textual ETL.
There is document fracturing, and there is named value processing (sometimes called “in-
line contextualization”).
Fig. 17.1.6 shows these two major divisions of the processing that occur within textual
ETL.
Fig. 17.1.6 The components of textual ETL.
In document fracturing, a document is processed in such a way that—upon being
processed—the document remains in a recognizable state. In named value processing, the
document is processed, but the document itself is not recognizable at the end of
processing.
Secondary Analysis
Textual ETL is really only the first step in the analysis of text. Textual ETL produces a
simple file that is then further analyzed. The first step gathers the information and
contextualizes it. However, to do textual analysis, further processing is necessary.
Fig. 17.1.7 shows that the output from textual ETL goes through a secondary analysis.
Typical secondary processing includes such activities as sentiment analysis, medical
record analysis and reconstruction, call center analysis, and other types of analysis.
Chapter 17.1: Managing Text