Fig. 10.1.15 Associative word processing.
As an example of associative word processing, consider the following raw text:
Contract ABC, requirement section, required conferences—every two weeks,...
The output to the analytic database might look like the following:
Document name, byte, context—scheduled meeting, value—required conference
Stop Word Processing
Perhaps, the most straightforward processing done in textual ETL is that of stop word
processing. Stop words are words that are necessary for proper grammar but are not
useful or necessary for the understanding of the meaning of what is being said. Typical
English stop words are “a,” “and,” “the,” “is,” “that,” “what,” “for,” “to,” “by,” and so
forth. Typical stop words in Spanish include “el,” “la,” “es,” “de,” “que,” and “y.” All
Latin-based languages have stop words.
In doing textual ETL processing, stop words are removed.
The analyst has the opportunity to customize the stop word list that is shipped with the
product.
Removing unnecessary stop words has the effect of reducing the overhead of processing
raw text with textual ETL.
Fig. 10.1.16 shows raw text that is being processed for stop words by textual ETL.
Chapter 10.1: Nonrepetitive Data